top of page

The Hidden Cost Killers: Why Your Backtest Lies

  • Writer: Nikhil Adithyan
    Nikhil Adithyan
  • 7 days ago
  • 10 min read

A realistic guide to backtesting with Python



Backtesting is crucial in developing quantitative trading strategies, offering a data-driven way to validate ideas and evaluate performance under historical market conditions. However, many underestimate the complexity of replicating real execution, often ignoring impacts like slippage, bid-ask spread, and trading costs. These omissions can inflate performance metrics and reduce robustness when moving from simulation to live trading.


Including execution frictions and market microstructure effects ensures backtest results reflect realistic, practical strategies rather than idealised models. The most important components that we often overlook, and which we are going to analyse in this article, are:


  • Slippage: The difference between the expected trade price and the actual execution price. It occurs due to market volatility and order size, often eroding backtested profitability.

  • Bid‑Ask Spread: The transaction cost embedded in the difference between buying and selling prices. Ignoring spreads can lead to overly optimistic returns, especially in less‑liquid instruments.

  • Trading Costs: Brokerage fees, exchange commissions, and financing costs accumulate over time, materially affecting net performance. Neglecting them produces unrealistically high returns and distorts risk‑adjusted ratios.


In this article, we will use Python for backtesting a relatively simple strategy, experiment with the above components, and finally present and discuss the results of our backtesting.


Extracting Intraday Data

First, we will perform our imports and provide the functions to retrieve the prices using EODHD’s Historical Intraday API.



from typing import Dict, List, Any, Callable
import pandas as pd
import itertools
from datetime import datetime, timedelta
import requests
import numpy as np
from itertools import product
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

token = 'YOUR EODHD API KEY'

def get_prices(symbol: str, from_date: str, to_date: str) -> pd.DataFrame:
    url = f'https://eodhd.com/api/intraday/{symbol}'
    from_ts = int(pd.to_datetime(from_date, utc=True).timestamp())
    to_ts = int(pd.to_datetime(to_date, utc=True).timestamp())
    params = {'api_token': token, "fmt": "json", "interval": "5m", "from": from_ts}
    resp = requests.get(url, params=params)
    df = pd.DataFrame(resp.json())
    if df.empty:
        return pd.DataFrame(columns=['open', 'low', 'high', 'close', 'volume'])
    df['datetime'] = pd.to_datetime(df['datetime'])
    df.sort_values(by='datetime', inplace=True)
    df.set_index('datetime', inplace=True)
    df = df[['open', 'low', 'high', 'close', 'volume']]
    return df

def get_prices_day_by_day(symbol: str, start_date: str, end_date: str) -> pd.DataFrame:
    start_dt = pd.to_datetime(start_date).normalize()
    end_dt = pd.to_datetime(end_date).normalize()

    all_prices = []
    current_dt = start_dt

    while current_dt <= end_dt:
        from_str = current_dt.strftime('%Y-%m-%d')
        day_df = get_prices(symbol, from_str, end_date)

        if day_df.empty:
            break

        day_df = day_df[day_df.index.normalize() <= end_dt]
        day_df = day_df[day_df.index.normalize() >= current_dt]

        if not day_df.empty:
            all_prices.append(day_df)

        max_dt = day_df.index.max().normalize() if not day_df.empty else None
        if max_dt is None or max_dt <= current_dt:
            break

        current_dt = max_dt + timedelta(days=1)

    if not all_prices:
        return pd.DataFrame(columns=['open', 'low', 'high', 'close', 'volume'])

    df = pd.concat(all_prices)
    df = df[~df.index.duplicated(keep='first')]
    df.sort_index(inplace=True)
    return df

df_1min = get_prices_day_by_day("AAPL.US", "2025-01-01", "2026-01-01")
df_1min

Note: Replace YOUR EODHD API KEY with your actual EODHD API key. If you don’t have one, you can obtain it by opening an EODHD developer account.


You will notice that after our imports, we are creating two separate functions.


get_prices: is the function that practically gets the 5-minute candles of a stock for a specific period using the intraday price data API.


get_prices_day_by_day: is the function that will loop through some days for a defined start and end date, and add all the results to a single dataframe.


The final dataframe should look like this:



The reason we are doing it this way is that typical historical data APIs usually provide up to a certain number of rows. In our case, we will backtest the entire 2025, which means we will have around 20K candles, an amount that is impossible to retrieve with a single call. That is why we will gradually call to get the necessary data.


The other reason for using the 5-minute timeframe is that we will treat it as a live price feed. However, our strategy will operate on the 4-hour timeframe. For this reason, we will utilise Python’s resample function to also calculate the 4H timeframe.



resampled = df_1min.resample('4H').agg({
    'open': 'first',
    'high': 'max',
    'low': 'min',
    'close': 'last',
    'volume': 'sum'
}).dropna()

resampled['ma_fast_4h'] = resampled['close'].rolling(window=10, min_periods=1).mean()
resampled['ma_slow_4h'] = resampled['close'].rolling(window=50, min_periods=1).mean()

close_4h = resampled['close']
volume_4h = resampled['volume']
ma_fast_4h = resampled['ma_fast_4h']
ma_slow_4h = resampled['ma_slow_4h']
signal = (ma_fast_4h - ma_slow_4h).clip(-1, 1)

df_1min['close_4h'] = df_1min.index.map(close_4h.reindex(df_1min.index, method='ffill'))
df_1min['volume_4h'] = df_1min.index.map(volume_4h.reindex(df_1min.index, method='ffill'))
df_1min['MA_fast'] = df_1min.index.map(ma_fast_4h.reindex(df_1min.index, method='ffill'))
df_1min['MA_slow'] = df_1min.index.map(ma_slow_4h.reindex(df_1min.index, method='ffill'))
df_1min['signal'] = np.where(df_1min['MA_fast'] > df_1min['MA_slow'], 1,
               np.where(df_1min['MA_fast'] < df_1min['MA_slow'], -1, 0))
df_1min

In order to keep everything clean and easy to understand, we are going to keep all the data that we need in the initial 5-minute dataframe. Before we explain the code, let’s talk about the strategy.


It will be a simple crossover strategy, where we will have two moving averages, a fast one (period 10) and a slow one (period 50). When the fast is above the slow, we consider that the market is uptrending, so we will invest long. In the opposite scenario, we will invest short.

Now let’s explain the code:


  • Firstly, using the resample function, we will calculate the 4-hour candles in a separate dataframe.

  • Then we will calculate the two moving averages and the signal of our strategy as we explained above

  • Finally, we will add that data to the initial 5-minute dataframe.


Our dataframe should look like the following:



As you notice, we still have the 5-minute candles with the extra 4-hour columns. That is why you will see that the 4-hour columns are repetitive and will only change every 4 hours.


Backtesting Time!

Now that we have our data to run our strategy, it is time to develop the actual backtesting engine.



def backtest(df, initial_capital=100000, commission_pct=0.001,
             spread_pct=0.0005, slippage_base_pct=0.0005):

    capital = initial_capital
    position = 0.0
    entry_price = 0.0
    entry_time = None
    trades = []

    def trade_costs(price, size):
        return abs(size * price * commission_pct)

    def adjust_price(price, direction, volume_4h, spread_pct, slippage_base_pct):
        # Volume slippage
        vol_norm = max(volume_4h / 20_000_000, 0.1)
        slippage_pct = slippage_base_pct / vol_norm
        slippage_pct = min(slippage_pct, 0.005)

        half_spread = price * spread_pct / 2
        slippage = price * slippage_pct

        # Buy high (pay ask), sell low (receive bid)
        if direction > 0:  # Buy / long entry
            return price + half_spread + slippage
        else:  # Sell / short entry or long exit
            return price - half_spread - slippage

    sig = 0
    for idx, row in df.iterrows():
        prev_sig = sig
        sig = row['signal']
        price_open = row['open']

        # Exit on signal reverse OR zero
        if position != 0 and sig * position < 0:
            direction_out = -1 if position > 0 else 1
            adj_exit = adjust_price(row['close'], direction_out, row['volume_4h'], spread_pct, slippage_base_pct)
            pnl_gross = position * (adj_exit - entry_price)
            costs = trade_costs(adj_exit, abs(position))
            pnl_net = pnl_gross - costs
            capital += pnl_net

            trades.append({
                'entry_time': entry_time, 'exit_time': idx,
                'entry_price': entry_price, 'exit_price': adj_exit,
                'position_size': position, 'pnl_gross': pnl_gross,
                'costs': costs, 'pnl_net': pnl_net, 'capital_after': capital
            })
            position = 0

        # Enter new position
        if position == 0 and sig != prev_sig and sig != 0:
            direction = sig
            adj_entry = adjust_price(price_open, direction, row['volume_4h'], spread_pct, slippage_base_pct)
            position_size = capital / adj_entry  # Fixed sizing
            entry_costs = trade_costs(adj_entry, abs(position_size))
            if capital > entry_costs:
                position = position_size * direction
                entry_price = abs(adj_entry)
                entry_time = idx
                capital -= entry_costs

    # Close final position
    if position != 0:
        last_row = df.iloc[-1]
        direction_out = -1 if position > 0 else 1
        adj_exit = adjust_price(last_row['close'], direction_out, last_row['volume_4h'], spread_pct, slippage_base_pct)
        pnl_gross = position * (adj_exit - entry_price)
        costs = trade_costs(adj_exit, abs(position))
        pnl_net = pnl_gross - costs
        capital += pnl_net
        trades.append({
            'entry_time': entry_time, 'exit_time': df.index[-1],
            'entry_price': entry_price, 'exit_price': adj_exit,
            'position_size': position, 'pnl_gross': pnl_gross,
            'costs': costs, 'pnl_net': pnl_net, 'capital_after': capital
        })

    trades_df = pd.DataFrame(trades)

    # Metrics (unchanged)
    total_return = (capital - initial_capital) / initial_capital * 100
    num_trades = len(trades_df)
    win_rate = (trades_df['pnl_net'] > 0).mean() * 100 if num_trades > 0 else 0
    avg_win = trades_df[trades_df['pnl_net'] > 0]['pnl_net'].mean() if win_rate > 0 else 0
    avg_loss = trades_df[trades_df['pnl_net'] < 0]['pnl_net'].mean() if (100 - win_rate) > 0 else 0
    profit_factor = abs(avg_win * (win_rate / 100) / (abs(avg_loss) * ((100 - win_rate) / 100))) if avg_loss != 0 else float('inf')

    metrics = {
        'final_capital': capital,
        'total_return_pct': total_return,
        'num_trades': num_trades,
        'win_rate_pct': win_rate,
        'avg_win': avg_win,
        'avg_loss': avg_loss,
        'profit_factor': profit_factor,
        'total_costs': trades_df['costs'].sum()
    }

    return trades_df, metrics

As you will notice, we have a function that takes our initial capital and the percentages of the components we discussed earlier: commission, spread, and slippage.


The code practically goes line by line, checking if there is already an open position, and depending on the signal it opens, closes a trade, or apparently does nothing if nothing has changed.


In the end, it will return a detailed dataframe with the trades and some result metrics of the strategy, such as return, number of trades, winning percentage, etc.


Let’s run it with some basic values:



trades_df, metrics = backtest(
    df_1min,
    initial_capital=50000,
    commission_pct=0.0005,
    spread_pct=0.0002,
    slippage_base_pct=0.0001
)

metrics

The metrics returned are the following:



{'final_capital': np.float64(33776.36077208968),
 'total_return_pct': np.float64(-32.44727845582063),
 'num_trades': 29,
 'win_rate_pct': np.float64(37.93103448275862),
 'avg_win': np.float64(1132.900075678812),
 'avg_loss': np.float64(-1560.2231972177665),
 'profit_factor': np.float64(0.44373639954880745),
 'total_costs': np.float64(602.5884081034602)}

We see that our strategy took 15 trades, lost money with a return of minus 32%, and the accumulated costs were $602. We should clarify here that, apparently, the strategy is not an exciting one and it loses money, but the aim of this article is to demonstrate the impact of costs on a strategy and not to showcase a winning one.


Cost Sensitivity Analysis: Running All Scenarios

Now is the time to run this backtesting engine with numerous possible cost combinations.



param_grid = {
    'commission_pct': [0.0, 0.0005, 0.001, 0.002],

    'spread_pct': [0.0, 0.0002, 0.0005, 0.001],

    'slippage_base_pct': [0.0, 0.0002, 0.0005, 0.001]
}

# Generate all combinations
scenarios = list(itertools.product(
    param_grid['commission_pct'],
    param_grid['spread_pct'],
    param_grid['slippage_base_pct']
))

For that reason, we will create a parameter grid with costs starting from zero (to establish a baseline for comparison). Please note that those numbers, such as a maximum of 0.2% for commission or 0.1% for spread and slippage, do exist in the market; therefore, we have scenarios that are valid depending on your broker.


Now is the time to run the backtesting of all the scenarios:



results_list = []

for i, (comm_pct, spread_pct, slip_pct) in enumerate(scenarios):
    trades_df, metrics = backtest(
        df_1min,
        initial_capital=10000,
        commission_pct=comm_pct,
        spread_pct=spread_pct,
        slippage_base_pct=slip_pct 
    )

    scenario_metrics = {
        'commission_pct': comm_pct,
        'spread_pct': spread_pct,
        'slippage_base_pct': slip_pct, 
        **metrics
    }
    results_list.append(scenario_metrics)

    print(f"#{i+1:2d}: C{comm_pct:.4f} S{spread_pct:.4f} Slip{slip_pct:.4f} → "
          f"{metrics['total_return_pct']:+6.1f}% ({metrics['num_trades']:2d} trades)")

metrics_df = pd.DataFrame(results_list)
metrics_df

The code runs all the scenarios, one by one, and stores the results of the metrics in an array, finally transforming it into a dataframe.



The initial observation is that the count of trades remains consistent across all runs. This is logical because the only way commissions could impact trade numbers is by changing the signal, which does not appear to happen here.


Let’s first compare the zero-cost scenario with a scenario that has costs around the middle of our possible range.



zero_cost = metrics_df[(metrics_df['commission_pct']==0) &
                       (metrics_df['spread_pct']==0) &
                       (metrics_df['slippage_base_pct']==0)]

realistic = metrics_df[(metrics_df['commission_pct']==0.001) &
                       (metrics_df['spread_pct']==0.0005) &
                       (metrics_df['slippage_base_pct']==0.0005)]

print(f"Gross edge:    {zero_cost['total_return_pct'].iloc[0]:+.1f}%")
print(f"Net realistic: {realistic['total_return_pct'].iloc[0]:+.1f}%")
print(f"Cost drag:     {zero_cost['total_return_pct'].iloc[0] - realistic['total_return_pct'].iloc[0]:+.1f}%")

This will return:



Gross edge:    -28.5%
Net realistic: -40.5%
Cost drag:     +12.0%

This indicates that when realistic levels of commission (0.1%), spread (0.05%), and slippage (0.05%) are included, the performance declines from -28.5% to -40.5%, adding a cost of +12.0 percentage points. This demonstrates that, even for a losing strategy, execution frictions worsen results, increase losses, and cause capital to erode more quickly than a basic PnL calculation would show.


Let’s now create three visualizations from the backtest results: a scatter plot illustrating how higher total costs are associated with lower returns, a line chart depicting the linear decline as commission rates increase, and a bar chart that compares average returns at different friction levels (ideal, retail, worst).



df = metrics_df.copy()

df = pd.read_csv('full_cost_impact.csv')  # your metrics_df
df['total_friction'] = df['commission_pct'] + df['spread_pct'] + df['slippage_base_pct']

fig, axes = plt.subplots(1,3, figsize=(15,4))

# 1. Cost Drag Explosion
axes[0].scatter(df['total_costs'], df['total_return_pct'])
axes[0].set_xlabel('Total Costs ($)'); axes[0].set_ylabel('Net Return (%)')
axes[0].set_title('Higher Costs = Lower Returns')

# 2. Commission Sensitivity
comm_means = df.groupby('commission_pct')['total_return_pct'].mean()
axes[1].plot(comm_means.index * 100, comm_means.values, marker='o')
axes[1].set_xlabel('Commission (%)'); axes[1].set_title('Commission Impact')

# 3. Friction Buckets
friction_bins = ['Ideal (0%)', 'Retail (0.2%)', 'Worst (0.3%)']
friction_means = df.groupby(pd.cut(df['total_friction'],
              bins=[0, 0.001, 0.002, 1], labels=friction_bins))['total_return_pct'].mean()
axes[2].bar(friction_bins, friction_means.values)
axes[2].set_ylabel('Net Return (%)'); axes[2].set_title('Friction Impact')

plt.tight_layout()
plt.savefig('cost_impact_charts.png')
plt.show()


The charts clearly show an inverse relationship between execution costs and strategy performance. Even small costs greatly increase losses for this marginal strategy, with commission displaying a strong linear sensitivity. Total friction buckets indicate that “retail” levels (0.2%) can reduce returns by over five percentage points compared to optimal conditions.


We will now determine how sensitive returns are to each cost component through linear regression across all scenarios. The process involves converting percentages to actual values, building a model that predicts total return based on commission, spread, and slippage, and then illustrating each cost’s marginal impact per percentage point.



metrics_df['comm_pct'] = metrics_df['commission_pct'] * 100
metrics_df['spread_pct'] = metrics_df['spread_pct'] * 100
metrics_df['slip_pct'] = metrics_df['slippage_base_pct'] * 100

from sklearn.linear_model import LinearRegression
X = metrics_df[['comm_pct', 'spread_pct', 'slip_pct']]
y = metrics_df['total_return_pct']

model = LinearRegression().fit(X, y)
coefs = pd.DataFrame({
    'Cost': ['Commission', 'Spread', 'Slippage'],
    'Impact_per_%': model.coef_
})
print(coefs.sort_values('Impact_per_%'))

And the results are:



Cost  Impact_per_%
2    Slippage    -97.184237
0  Commission    -36.458293
1      Spread    -18.401344

The regression shows slippage has the greatest damaging effect, reducing returns by 97.4% per 1% slippage, followed by commissions (-36.4%) and spreads (-18.4%). This surprising order highlights that slippage is predominant in high-frequency or low-liquidity situations, illustrating why execution quality often matters more than headline broker fees for systematic strategies.


The next analysis point will involve a code that identifies each cost component’s real-world drag by comparing average returns when only that cost is active (others are zeroed out) against the zero-cost baseline, ranking their practical impact across tested ranges.



# REAL Delta Return per 0.01% cost increase
base_return = metrics_df[metrics_df['commission_pct']==0.0]['total_return_pct'].iloc[0]  # -9.97%

# Average drag per cost type (isolated)
comm_drag = metrics_df[metrics_df['spread_pct']==0.0]['total_return_pct'].mean() - base_return
spread_drag = metrics_df[metrics_df['commission_pct']==0.0]['total_return_pct'].mean() - base_return
slip_drag = metrics_df[metrics_df['commission_pct']==0.0]['total_return_pct'].mean() - base_return

print(f"""
REAL COST IMPACT RANKING:
Spread {metrics_df['spread_pct'].max()*100:.1f}% → {spread_drag:.1f}% drag
Commission {metrics_df['commission_pct'].max()*100:.1f}% → {comm_drag:.1f}% drag
Slippage {metrics_df['slippage_base_pct'].max()*100:.1f}% → {slip_drag:.1f}% drag
""")

With the following results:



REAL COST IMPACT RANKING:
Spread 10.0% → -6.1% drag
Commission 0.2% → -8.5% drag
Slippage 0.1% → -6.1% drag

Commissions stand out as the main factor reducing performance by -8.5% despite a modest maximum rate of 0.2%, while spread and slippage both contribute a -6.1% impact from much higher tested levels of 1.0%. This shows that commissions have a disproportionately large effect in fixed-per-trade setups, highlighting the importance of broker selection as a key leverage point for systematic strategies.


Key takeaways

The key takeaways from the above analysis can be summarised as follows:


  • Execution costs amplify losses: Realistic frictions turn a marginal strategy into an outright loser, adding significant cost drag beyond the gross edge.

  • Slippage most destructive per unit: Regression analysis ranks slippage highest in impact per percentage point, ahead of commissions and spreads.

  • Commissions punch above weight: Despite modest rates, they generate the largest real-world drag compared to higher-tested spreads and slippage.

  • Clear inverse cost-return relationship: Charts show that higher costs directly reduce returns; retail friction levels substantially cut performance relative to the ideal.

  • Model frictions from the start: Fixed trade count proves costs erode pure edge (essential for realistic live trading viability).


Next Steps and Final Thoughts

In this article, we built a complete backtesting pipeline from the ground up. We extracted intraday price data for AAPL using EODHD’s Historical Intraday API, resampled it to a 4-hour timeframe, and implemented a simple moving average crossover strategy. We then stress-tested that strategy across 64 cost scenarios, varying commission, bid-ask spread, and slippage to quantify their individual and combined drag on performance. The results were clear: realistic frictions turned an already marginal strategy into a significantly worse one, with cost drag alone accounting for over 12 percentage points of additional loss.


There is plenty of room to extend this framework further. A natural next step is to incorporate position-sizing constraints and market-impact models that scale slippage dynamically with order size relative to average daily volume. You could also introduce walk-forward validation to avoid overfitting cost assumptions to a single period, or expand the asset universe to compare execution frictions across equities, ETFs, and futures. Adding financing costs for overnight positions and borrowing costs for short trades would make the model even more representative of live trading conditions.


Ultimately, the lesson here is not specific to any one strategy. Every systematic trader faces the same fundamental challenge: the gap between gross edge and net reality. Execution costs are not a footnote; they are a defining variable in whether a strategy survives production. Model them early, model them honestly, and let that discipline carry through every stage of your development process.

Comments


Bring information-rich articles and research works straight to your inbox (it's not that hard). 

Thanks for subscribing!

© 2023 by InsightBig. Powered and secured by Wix

bottom of page