Retrospective Simulation: Make your Backtests more Realistic
- Nikhil Adithyan
- 5d
- 9 min read
Mastering trading strategy backtesting with multiple price paths

In quantitative trading, it’s important to be careful when testing your strategy on all available data, as this can sometimes cause the rules to become too tuned and not perform well in real trading situations. To get a better idea of how your strategy really works, try dividing your historical data into a training (or optimisation) period and a test period. This approach allows you to evaluate your strategy on unseen data, much like real market conditions, which helps prevent overfitting and gives you a clearer picture of its strength across different market regimes.
To do a more robust backtesting, we will also use simulated price paths via a non-parametric Brownian bridge to assess a trading strategy’s resilience. Unlike relying on a single historical sequence, this method generates multiple paths capturing key statistical features. Testing the strategy across these paths helps us understand its performance in various market scenarios, reducing overfitting risk and offering insights into its consistency and resilience. This provides a more thorough evaluation of the strategy’s robustness and real-world potential.
What to expect in this article:
Get 20 years of data for Apple stock till today
Develop our two Moving Average strategy, where when the fast MA is higher than the slow one, we will go long, and short the other way around.
Optimise the strategy for the first 15 years and see which parameters of the moving averages produce the best return.
Check the results of the optimised parameters for the last 5 years
Simulate 1000 price paths more for those 15 years
Optimise our strategy for all those alternate paths
Discuss the results
The aim of this article isn’t to give you a perfect, ready-to-go algorithm that will make you rich overnight. Instead, it’s about helping you understand a different approach that you can smoothly incorporate into your backtest strategy. I hope you find it helpful and inspiring!
Retrospective Simulation
Before we dive into the code, let’s briefly discuss retrospective simulation. This technique models alternate price paths based on actual historical data. As mentioned earlier, this article will focus on the non-parametric Brownian bridge method. Other methods also exist, with the most well-known being:
Traditional Monte Carlo simulation, which generates random price paths assuming a specified stochastic model like geometric Brownian motion,
The Euler-Maruyama method, which uses discrete time steps to approximate stochastic differential equations for simulating price processes,
There are more advanced techniques like the Brownian Bridge Maximum Method, Quadratic-Exponential schemes, and Multidimensional Scaled Brownian Bridge. These methods are designed to enhance accuracy and better capture complex features such as volatility clustering or correlations between multiple assets.
Choosing the best simulation method really depends on the strategy you’re testing, the amount of computational resources you have, and how complex the model needs to be. Retrospective simulation is especially helpful because it allows you to test strategies against many different versions of historical data, which can help prevent overfitting. This way, you can feel more confident that your strategies are robust before putting real capital on the line.
In our case, we chose the non-parametric Brownian bridge method in this article because it effectively preserves the key statistical properties of historical price data while generating alternative price paths. Also, it is not so heavy on resources, which is a good start for us.
Let’s Code
First and most important, let’s see our imports, as well as the parameters we will need:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import requests
from itertools import product
from tqdm import tqdm
token = 'YOUR FMP TOKEN'
from_date_train = '2005-10-31'
to_date_train = '2020-10-31'
from_date_test = '2020-11-01'
to_date_test = '2025-10-31'
fast_period = 21
slow_period = 55
fast_range = range(5, 46, 5)
slow_range = range(50, 251, 10)
Besides the FMP token, to get the prices for AAPL, we will need:
The dates that will be necessary for the testing
Some basic parameters for the two MAs
The ranges that we will use for our optimisation
Now that we have all this, let’s get the AAPL prices. We will do it with the FMP Daily Chart EOD API. You will notice that we will request the dates from the beginning of our training till the end of our testing.
ticker = 'AAPL'
url = f'https://financialmodelingprep.com/api/v3/historical-price-full/{ticker}'
df_ohlc = pd.DataFrame()
querystring = {"apikey":token, "from":from_date_train, "to":to_date_test}
data = requests.get(url, querystring)
data = data.json()
df = pd.DataFrame(data['historical'])
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values('date').set_index('date')
Traditional Backtesting and Optimisation
As we promised, let’s first develop our strategy and backtest it with the basic data.
def sma_strategy_backtest(close, fast_period, slow_period):
df = pd.DataFrame({'close': close})
df['pct_change'] = df['close'].pct_change()
df['fast_sma'] = df['close'].rolling(window=fast_period).mean()
df['slow_sma'] = df['close'].rolling(window=slow_period).mean()
# Generate signal
df['signal'] = 0
df.loc[(df['fast_sma'] > df['slow_sma']), 'signal'] = 1
df.loc[(df['fast_sma'] < df['slow_sma']), 'signal'] = -1
# Calculate returns with shift to avoid lookahead bias
df['strategy_return'] = df['pct_change'] * df['signal'].shift(1)
df['equity'] = 100 * (1 + df['strategy_return']).cumprod()
# Calculate Buy and Hold total return in percentage
df['bnh_equity'] = 100 * (1 + df['pct_change']).cumprod()
bnh_total_ret = (df['bnh_equity'].iloc[-1] / df['bnh_equity'].dropna().iloc[0] - 1) * 100
# Strategy total return
equity = df['equity']
total_ret = (equity.iloc[-1] / equity.dropna().iloc[0] - 1) * 100
return equity, total_ret, bnh_total_ret
The backtesting will be performed solely on the close price, generating the signal based on the alignment of the moving averages as previously explained. The return, and ultimately the equity, will be calculated based on the signal. Finally, it will produce the series of the equity, the total return, as well as the buy-and-hold return to provide a point of reference.
Now we will run this with the base parameters we defined initially and print the results:
equity,total_ret, bnh_total_ret = sma_strategy_backtest(df['close'], fast_period, slow_period)
print("Total return (%):", total_ret)
print("Buy and Hold return (%):", bnh_total_ret)

The returns are positive, but they don’t come close to those of a Buy-and-Hold strategy. However, as mentioned, this article isn’t about identifying the most profitable approach but rather about illustrating the backtesting process using alternative methods.
Let’s fine-tune our strategy (also known as overfitting ;) ) to discover what our results will be.
def optimize_sma_periods(close, fast_range, slow_range):
best_result = {'fast': None, 'medium': None, 'slow': None, 'total_return': -np.inf}
best_equity = None
# Iterate valid combinations: fast < medium < slow
for fast, slow in product(fast_range, slow_range):
if fast < slow:
equity, total_ret, bnh_total_ret = sma_strategy_backtest(close, fast, slow)
if total_ret > best_result['total_return']:
best_result = {'fast': fast, 'slow': slow, 'total_return': total_ret}
best_equity = equity
buy_and_hold = bnh_total_ret
return {
'best_periods': (best_result['fast'], best_result['slow']),
'best_total_return': best_result['total_return'],
'best_equity': best_equity,
'buy_and_hold': buy_and_hold
}
result = optimize_sma_periods(df['close'], fast_range, slow_range)
print("Best periods (fast, slow):", result['best_periods'])
print("Best total return (%):", result['best_total_return'])
print("Buy and Hold return (%):", result['buy_and_hold'])

We observe that the highest return comes from a very fast MA (10 days) and a relatively slow one (220 days). This is because the stock (like every stock in recent years) has delivered tremendous returns, so the strategy aims to stay as long as possible.
Train and Test
Apparently, in the previous step, we have overfitted our parameters, and no experienced (or sane) trader would believe that those are the parameters to be used with real money from tomorrow…
Let’s assume today is 5 years earlier, and that we have optimised our parameters using data up to that point. To do this, we will keep the first 15 years and run the same optimisation.
df_train = df.loc[from_date_train:to_date_train]
result = optimize_sma_periods(df_train['close'], fast_range, slow_range)
best_fast = result['best_periods'][0]
best_slow = result['best_periods'][1]
print("Best periods (fast, slow):", best_fast, best_slow)
print("Best total return (%):", result['best_total_return'])
print("Buy and Hold return (%):", result['buy_and_hold'])

Again, the best train parameters are 10 for fast and 220 for slow. Let’s see what this optimisation will yield for the next 5 years up to today…
df_test = df.loc[from_date_test:to_date_test]
equity,total_ret, bnh_total_ret = sma_strategy_backtest(df_test['close'], best_fast, best_slow)
print("Best periods applied (fast, medium, slow):", best_fast, best_slow)
print("Total return (%):", total_ret)
print("Buy and Hold return (%):", bnh_total_ret)
Proportionately, the results are almost identical, with a small return of 5%, while the stock’s returns were more than double the price.
Alternative paths
There are many methods to compute alternative paths. In our case, we will use the non-parametric Brownian bridge framework, which, as previously mentioned, maintains the statistical features of the price history and ensures the path starts and ends at the same price.
close_prices = df['close']
def non_parametric_brownian_bridge(close_prices, n_paths=1000, seed=42):
np.random.seed(seed)
n = len(close_prices)
X0 = np.log(close_prices.iloc[0])
Xn = np.log(close_prices.iloc[-1])
log_returns = np.log(close_prices / close_prices.shift(1)).dropna().values
paths = np.zeros((n, n_paths))
for i in range(n_paths):
# Sample n-1 returns and center them
sampled = np.random.choice(log_returns, size=n - 1, replace=True)
drift_correction = (Xn - X0) / (n - 1) - np.mean(sampled)
sampled += drift_correction # Center drift
W = np.concatenate(([0], np.cumsum(sampled))) # Now length n
# Brownian bridge formula for all time steps (n)
bridge = X0 + W + np.linspace(0, 1, n) * (Xn - X0 - W[-1])
paths[:, i] = bridge
sim_prices = np.exp(paths)
sim_prices[~np.isfinite(sim_prices)] = np.nan
return sim_prices
simulated_paths = non_parametric_brownian_bridge(close_prices, n_paths=1000)
for i in range(simulated_paths.shape[1]):
df[f'sim_path_{i+1}'] = simulated_paths[:, i]
plt.figure(figsize=(14, 7))
plt.plot(df.index, df.loc[:, 'sim_path_1':'sim_path_1000'], lw=1, alpha=0.7)
plt.plot(df.index, close_prices, lw=2, label='Original', color='black')
plt.title('Non-Parametric Brownian Bridge - Simulated Paths')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()

As you can see, plotting 1000 of the possible paths, the beginning and end are at the same price. Notice the white line, which represents the actual history.
Now is the time to start the fun. We will run all the possible combinations of parameters for each price path. There will be almost 200K runs, so be patient.
df_train_multiple_paths = df.loc[from_date_train:to_date_train]
results = []
for i in tqdm(range(1,1001,1)):
print(f'Processing path {i}')
for fast, slow in product(fast_range, slow_range):
_, total_backtest_ret, _ = sma_strategy_backtest(df_train_multiple_paths['sim_path_' + str(i)], fast, slow)
result = {'fast': fast, 'slow': slow, 'total_return': total_backtest_ret}
results.append(result)
df_all_paths_train = pd.DataFrame(results)
df_all_paths_train.to_csv('df_all_paths_train_2.csv', index=False)
df_all_paths_train

For each run, we will also calculate the actual return over the last 5 years. As you will see in the code, we will not calculate for each row (since the combination of the MA parameters repeats). Instead, we will compute all the unique combinations first and then merge them into the final dataframe.
unique_combos = (
df_all_paths_train[['fast', 'slow']]
.drop_duplicates()
.copy()
)
unique_combos[['fast', 'slow']] = unique_combos[['fast', 'slow']].astype(int)
def _compute_test_metrics(row):
f, s = int(row['fast']), int(row['slow'])
_, total_ret, bnh_ret = sma_strategy_backtest(df_test['close'], f, s)
return pd.Series({'test_total_return': total_ret, 'test_bnh_return': bnh_ret})
# Evaluate each unique combo once
unique_combos[['test_total_return', 'test_bnh_return']] = unique_combos.apply(_compute_test_metrics, axis=1)
# Join back to all rows to align with every path's chosen combo
df_all_paths_with_test = df_all_paths_train.merge(unique_combos, on=['fast', 'slow'], how='left')
df_all_paths_with_test

Analysing the results
Now that we have our dataframe with all the results, let’s try some plots to make some sense out of all this effort.
Our first try will be a 3D scatter plot, where we will use the 2 MAs as well as the final return in the test period (the last 5 years)
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(10, 7))
ax = fig.add_subplot(111, projection='3d')
# Scatter plot with fast, slow, and test_total_return
scatter = ax.scatter(df_all_paths_with_test['fast'],
df_all_paths_with_test['slow'],
df_all_paths_with_test['test_total_return'],
c=df_all_paths_with_test['test_total_return'],
cmap='viridis',
alpha=0.7)
ax.set_xlabel('Fast MA Length')
ax.set_ylabel('Slow MA Length')
ax.set_zlabel('Test Period Return')
fig.colorbar(scatter, label='Test Period Return')
plt.title('3D Scatter Plot of MA Parameters vs Test Total Return')
plt.show()

Overall, the 3D plots can be confusing. However, upon closer inspection, you’ll notice that the farthest back part of the plot indicates that we should expect better returns with very fast and very slow MAs, which also supports our initial findings.
Which brings us to the following plot, where we will use boxplots to distinguish the two MAs. We will also bin the MAs for a better visualisation.
# Define bins for fast and slow parameters (customize ranges as needed)
fast_bins = np.arange(df_all_paths_with_test['fast'].min(),
df_all_paths_with_test['fast'].max() + 5, 5)
slow_bins = np.arange(df_all_paths_with_test['slow'].min(),
df_all_paths_with_test['slow'].max() + 10, 10)
# Create binned columns for fast and slow
df_all_paths_with_test['fast_bin'] = pd.cut(df_all_paths_with_test['fast'], fast_bins)
df_all_paths_with_test['slow_bin'] = pd.cut(df_all_paths_with_test['slow'], slow_bins)
fig, axes = plt.subplots(1, 2, figsize=(16, 6), sharey=True)
# Boxplot for fast parameter bins
df_all_paths_with_test.boxplot(column='test_total_return', by='fast_bin', ax=axes[0], grid=False)
axes[0].set_title('Test Returns by Fast MA Length')
axes[0].set_xlabel('Fast MA Length Range')
axes[0].set_ylabel('Test Period Return')
axes[0].tick_params(axis='x', rotation=45)
# Boxplot for slow parameter bins
df_all_paths_with_test.boxplot(column='test_total_return', by='slow_bin', ax=axes[1], grid=False)
axes[1].set_title('Test Returns by Slow MA Length')
axes[1].set_xlabel('Slow MA Length Range')
axes[1].tick_params(axis='x', rotation=45)
plt.suptitle('') # Remove default pandas title
plt.tight_layout()
plt.show()

This will give us some more insights:
Regarding the Fast MA, our initial results are once again confirmed. The returns are better when using a range of 5 to 10 periods for a fast MA.
Regarding the Slow MA box plots, they provide additional insights. We observe that returns tend to be good with some “faster” slow MAs in the range of 50 to 100. However, in this area, we also notice the most outliers (the dots), which is undesirable since it indicates a higher risk.
Another interesting plot is a heatmap that shows the risk of overfitting. This is achieved by calculating an overfitting metric, which is the difference between train and test returns. Let’s look at that:
# Calculate overfitting metric
df_all_paths_with_test['overfit'] = df_all_paths_with_test['total_return'] - df_all_paths_with_test['test_total_return']
# Group by fast and slow and aggregate overfit by mean (or median if preferred)
agg_df = df_all_paths_with_test.groupby(['fast', 'slow'])['overfit'].mean().reset_index()
# Pivot the aggregated DataFrame
heatmap_data = agg_df.pivot(index='fast', columns='slow', values='overfit')
plt.figure(figsize=(12, 8))
sns.heatmap(heatmap_data, cmap='coolwarm', center=0,
cbar_kws={'label': 'Overfitting Risk (Train - Test Return)'},
linewidths=0.5)
plt.title('Heatmap of Overfitting Risk by MA Parameters')
plt.xlabel('Slow MA Length')
plt.ylabel('Fast MA Length')
plt.show()

Well, that explains everything. The reason the returns during the test period were in the very slow and very fast MAs is that these areas carry a higher concentrated risk of overfitting. In these zones, the difference between training returns and test returns is the greatest. Essentially, these areas generate the best results during training, but when comparing train and test, the largest gaps are observed there.
Final Thoughts
What have we learned in this article:
Dividing data into training and testing periods provides a realistic assessment of strategy robustness.
Using non-parametric Brownian bridge simulations generates multiple price paths, testing the strategy against diverse market scenarios.
Simulated paths offer more profound insight into consistency and risk, enhancing confidence in the strategy’s real-world application.
And last but not least, when trading with real money, remember: backtest like your profits depend on it — because they do! The more you test, the less you guess, and the happier your portfolio will be.