Causal AI for Markets: What Really Drives Stock Risk?

Nikhil Adithyan
5 hours ago
12 min read

Measuring how valuation, volatility, and profitability cause drawdowns

Every investor talks about risk, but few can explain what truly drives it.

We measure volatility, beta, or max drawdown, and assume we understand exposure. Yet these numbers are only outcomes. They tell us how much pain existed, not why it happened.

Sometimes a high P/E stock crashes harder. Sometimes a low-margin business stays resilient. Correlation alone can’t answer what actually causes those reactions. That’s where most models stop, and where this story begins.

This experiment tries to answer one deceptively simple question:

Do fundamentals and market traits directly cause deeper drawdowns, or do they just move in sync with them?

To find out, I built a causal AI framework that analyzes how valuation, volatility, and profitability affect a stock’s downside. The data comes from EODHD’s historical APIs, covering ten years of S&P 500 stocks, which is more than enough to see how company characteristics shape real risk, not just statistical noise.

The goal isn’t to forecast crashes or design a trading signal. It’s to understand what truly makes a portfolio fragile, and how causal inference can separate reason from coincidence.

From Correlation to Causation

It’s easy to assume that if two things move together, one must cause the other.

High valuations, rising volatility, falling prices; the patterns often look convincing enough. I used to take those relationships at face value until I realized that correlation is mostly storytelling. It looks logical, it feels logical, but it rarely holds up when conditions change.

That’s where causal inference comes in.

Instead of asking what usually happens together, it asks what would have happened if something were different.

For instance, if two companies were identical except for their valuation, would the higher-valued one experience a deeper drawdown? That’s the kind of “what-if” question correlation can’t answer, but causal models can.

In this study, I treated valuation, volatility, and profitability as the key “treatments.” The idea was to see how each factor, when changed, affects a company’s downside risk while holding everything else constant. It’s less about prediction and more about simulation, i.e., building alternate realities to see which traits consistently lead to more pain.

I used EODHD’s data to make this possible: ten years of S&P 500 history, packed with price, volume, and fundamental data. The quality and consistency of that feed made it possible to treat this like a real-world experiment instead of just another backtest.

The Setup: Getting the Data Right

Before getting into the causal analysis, I needed a proper foundation, which is clean, consistent data. Everything starts here.

Importing Packages

I first imported the essential packages. Nothing fancy, just what’s needed to pull data, shape it, and run the models later.


import pandas as pd
import numpy as np
import requests
import time
from datetime import datetime, timedelta
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
import matplotlib.pyplot as plt

This covers the basics: pandas and numpy for handling data, requests for API calls, sklearn for causal modeling, and matplotlib for plotting.

Config and Extracting Tickers

Next, I configured the API and built a small universe of S&P 500 tickers. Instead of manually typing them out, I pulled the current list directly from the EODHD Fundamental Data endpoint.


api_key = 'YOUR EODHD API KEY'
base_url = 'https://eodhd.com/api'

tickers_url = f'{base_url}/fundamentals/GSPC.INDX?api_token={api_key}&fmt=json'
tickers_raw = requests.get(tickers_url).json()
tickers = list(pd.DataFrame(tickers_raw['Components']).T['Code'])
tickers = [item + '.US' for item in t]

start_date = '2018-01-01'
end_date   = datetime.today().strftime('%Y-%m-%d')

EODHD’s /fundamentals/GSPC.INDX endpoint provides the full list of S&P 500 constituents. By appending .US to each code, I ensured compatibility with the /eod/ endpoint for fetching daily prices.

Fetching Historical Data

Once the tickers were ready, I moved on to fetching price data and calculating returns.

The function below retrieves daily OHLC values using EODHD’s Historical Data endpoint for each symbol between the start and end date, then stitches everything together.


def fetch_eod_prices(ticker, start, end):
    url = f'{base_url}/eod/{ticker}?from={start}&to={end}&api_token={api_key}&fmt=json'
    r = requests.get(url).json()
    
    if not isinstance(r, list) or not r:
        return pd.DataFrame(columns=['date','close','ticker'])
    
    df = pd.DataFrame(r)[['date','close']]
    df['ticker'] = ticker
    
    print(f'{ticker} FETCHED DATA')
    time.sleep(1)
    
    return df

frames = [fetch_eod_prices(t, start_date, end_date) for t in tickers]

prices_df = pd.concat(frames, ignore_index=True).dropna(subset=['close'])

prices_df['date'] = pd.to_datetime(prices_df['date'])
prices_df = prices_df.sort_values(['ticker','date'])
prices_df['ret'] = prices_df.groupby('ticker')['close'].pct_change().fillna(0.0)
prices_df = prices_df[['ticker','date','close','ret']]

prices_df = prices_df.reset_index(drop=True)
prices_df

This creates a unified dataset of over a million rows across all S&P 500 tickers, each with the closing price and daily return. This is how the final dataframe looks like:

The output confirms everything is aligned:

Each ticker is ordered by date.
The first row per ticker shows a return of zero (which validates the group logic).
The data is dense, balanced, and ready for feature engineering in the next step.

Measuring Stress: Calculating Drawdowns

Once the return data was ready, I needed to turn it into something that represents risk.

Returns tell how a stock moved, but not how it behaved under pressure. Drawdown captures exactly that, which is how deep a stock fell from its recent peak during stress.

For this, I defined two major market stress windows:

COVID crash (Feb–Apr 2020)
Rate-hike shock (Aug–Oct 2022)

Each window captures a different macro shock, one liquidity-driven, the other inflation-driven. This gives a good contrast for causal inference later.

Calculating drawdowns


def max_drawdown(series: pd.Series) -> float:
    cummax = series.cummax()
    dd = series / cummax - 1.0
    return float(dd.min())

stress_windows = [
    ('2020-02-15', '2020-04-30'), 
    ('2022-08-01', '2022-10-31')
]

rows = []
for t in prices_df['ticker'].unique():
    df_t = prices_df.loc[prices_df['ticker'] == t, ['date','close']].set_index('date').sort_index()
    for i, (start, end) in enumerate(stress_windows, start=1):
        s = df_t.loc[start:end, 'close']
        if s.empty:
            continue
        rows.append({
            'ticker': t,
            'window_id': i,
            'start': start,
            'end': end,
            'max_dd': max_drawdown(s)
        })

drawdowns_df = pd.DataFrame(rows)
drawdowns_df

The max_drawdown() function tracks the running peak of a stock’s price and measures the percentage drop from that peak.

Then, for each ticker, I extract prices within the defined stress windows and compute the worst drawdown in that range.

By iterating through two different market crises, the result shows how each company handled external shocks.

Each stock now has up to two entries, one per stress event, along with its corresponding maximum drawdown.

Fetching Fundamentals & Defining Treatments

With the price and drawdown data ready, the next step was to define the “treatments.”

In causal inference, a treatment represents an event or characteristic whose effect we want to measure. Here, I wanted to understand how certain financial traits, like valuation, volatility, and profitability, affect a company’s risk profile during market stress.

The three treatments I picked were:

High PE ratio: stocks that are priced expensively relative to their earnings.
High beta: stocks that move aggressively with the market.
Low profit margin: stocks with thinner profitability cushions.

To build these, I used EODHD’s Fundamentals API, which provides all the key metrics I needed.

Fetching Fundamentals Data


def fetch_fundamentals(ticker):
    url = f'{base_url}/fundamentals/{ticker}?api_token={api_key}&fmt=json'
    r = requests.get(url).json()
    if not isinstance(r, dict):
        return pd.DataFrame()

    data = r.get('Highlights', {})
    general = r.get('General', {})
    val = r.get('Valuation', {})

    row = {
        'ticker': ticker,
        'sector': general.get('Sector', np.nan),
        'market_cap': data.get('MarketCapitalization', np.nan),
        'beta': r.get('Technicals')['Beta'],
        'eps': data.get('EarningsShare', np.nan),
        'div_yield': data.get('DividendYield', np.nan),
        'net_margin': data.get('ProfitMargin', np.nan),
        'pe_ratio': val.get('TrailingPE'),
        'pb_ratio': val.get('PriceBookMRQ'),
    }
    
    print(f'{ticker} FETCHED DATA')
    time.sleep(1)
    
    return pd.DataFrame([row])

fund_frames = [fetch_fundamentals(t) for t in tickers]
fundamentals_df = pd.concat(fund_frames, ignore_index=True)

num_cols = ['market_cap','beta','eps','div_yield','net_margin','pe_ratio','pb_ratio']
fundamentals_df[num_cols] = fundamentals_df[num_cols].apply(pd.to_numeric, errors='coerce')

fundamentals_df

Each call to /fundamentals/{ticker} returns multiple nested blocks: General, Highlights, Valuation, and Technicals. I extracted what I needed from each:

General: Sector
Highlights: Earnings per share, profit margin, dividend yield, market cap
Valuation: PE and PB ratios
Technicals: Beta

All fields are converted into a single flat record per ticker and combined into one large DataFrame.

The structure is wide enough to hold all core features and still compact enough to merge easily with returns later.

Merging data and creating treatments

This step creates the dataset used for causal inference, combining financial performance with firm-level traits.


# merge fundamentals with drawdowns per ticker-window
combined_df = drawdowns_df.merge(fundamentals_df, on='ticker', how='left')

# basic controls
combined_df['log_mcap'] = np.log(combined_df['market_cap'].replace({0: np.nan}))
combined_df['sector'] = combined_df['sector'].fillna('Unknown')

combined_df.head()

Each row here represents a stock-window pair: its drawdown during a specific stress period, valuation profile, and sector context.

Log-transforming market cap helps stabilize the data since firm sizes vary wildly, while sector labels later help balance comparisons.

Defining Treatment Flags

With the merged dataset ready, I defined what qualifies as “treated.” For example, a stock is treated for a high PE if it falls above the 70th percentile PE within its own sector. That ensures comparisons happen within similar industries instead of globally.


def sector_pe_thresholds(df, min_count=5, q=0.70):
    th = df.groupby('sector')['pe_ratio'].quantile(q).rename('pe_thr').to_dict()
    global_thr = df['pe_ratio'].quantile(q)
    return {s: (thr if not np.isnan(thr) else global_thr) for s, thr in th.items()}

pe_thr_map = sector_pe_thresholds(fundamentals_df)

combined_df['high_pe'] = combined_df.apply(
    lambda r: 1 if pd.notna(r['pe_ratio']) and r['pe_ratio'] > pe_thr_map.get(r['sector'], np.nan) else 0, axis=1
)

combined_df['high_beta'] = (combined_df['beta'] > 1.2).astype(int) # market sensitivity proxy
combined_df['low_margin'] = (combined_df['net_margin'] < 0).astype(int) # weak profitability

treat_cols = ['high_pe', 'high_beta', 'low_margin']
ctrl_cols = ['log_mcap', 'sector']
model_df = combined_df[['ticker','window_id','start','end','max_dd','pe_ratio','beta','net_margin','market_cap'] + treat_cols + ctrl_cols].copy()

model_df.head(10)

The logic is simple but deliberate:

High PE stocks represent overvalued firms within each sector.
High beta stocks are those that move more than 20% above market sensitivity.
Low margin stocks reflect profitability risk.

The resulting dataset looks like this:

Each stock now carries its drawdown, financial metrics, and binary treatment flags. We have exactly what’s needed for causal analysis in the next section.

Estimating Causal Effects

With all the inputs ready, drawdowns, fundamentals, and treatment flags, it was time to move beyond correlations and actually test for causation.

The question was straightforward:

How much more (or less) do high-PE, high-beta, and low-margin stocks fall during market stress after accounting for factors like size and sector?

That’s a classic causal inference problem. Instead of using traditional regressions, I decided to rely on two modern estimators:

Inverse Probability Weighting (IPW) reweights observations to mimic a randomized experiment.
Doubly Robust (DR) Estimator combines regression and reweighting to reduce bias.

Implementing the Estimators

The first function computes the IPW estimate. It fits a logistic model to estimate the propensity score, which is the probability of being “treated” (for example, being a high-PE stock) given the controls.

Then it adjusts the sample weights to make the treated and control groups comparable.

The second function goes a step further by combining a regression model with weighting to create a doubly robust estimate. Even if one of the models is slightly misspecified, the results tend to remain stable.


def causal_ipw(data, treatment, outcome, controls):
    df = data.dropna(subset=[treatment, outcome] + controls).copy()
    X = pd.get_dummies(df[controls], drop_first=True)
    T = df[treatment]
    y = df[outcome]

    # propensity score model
    ps_model = LogisticRegression(max_iter=500)
    ps_model.fit(X, T)
    p = ps_model.predict_proba(X)[:,1].clip(0.01, 0.99)

    # inverse weighting
    weights = T/p + (1-T)/(1-p)
    ate = np.mean(weights * (T - p) * y) / np.mean(weights * (T - p)**2)

    return ate

def causal_dr(data, treatment, outcome, controls):
    df = data.dropna(subset=[treatment, outcome] + controls).copy()
    X = pd.get_dummies(df[controls], drop_first=True)
    T = df[treatment].values
    y = df[outcome].values

    # models
    m_y = RandomForestRegressor(n_estimators=200, random_state=42)
    m_t = LogisticRegression(max_iter=500)

    m_y.fit(np.column_stack([X, T]), y)
    m_t.fit(X, T)
    p = m_t.predict_proba(X)[:,1].clip(0.01,0.99)
    y_hat_1 = m_y.predict(np.column_stack([X, np.ones(len(X))]))
    y_hat_0 = m_y.predict(np.column_stack([X, np.zeros(len(X))]))

    dr_scores = (T*(y - y_hat_1)/p) - ((1-T)*(y - y_hat_0)/(1-p)) + (y_hat_1 - y_hat_0)
    ate = np.mean(dr_scores)
    return ate

results = []
for t in ['high_pe','high_beta','low_margin']:
    ate_ipw = causal_ipw(model_df, t, 'max_dd', ['log_mcap','sector'])
    ate_dr = causal_dr(model_df, t, 'max_dd', ['log_mcap','sector'])
    results.append({'treatment': t, 'ate_ipw': ate_ipw, 'ate_dr': ate_dr})

effects_df = pd.DataFrame(results)
effects_df

Results

Interpretation

This table is where things start getting interesting.

Each row represents one treatment, and the two columns (ate_ipw and ate_dr) show how that characteristic influences the maximum drawdown on average.

Here’s what stands out:

High PE stocks tend to fall about 0.21 more (IPW) during stress periods. The DR estimator shows a smaller effect, but the direction is consistent. Overvalued stocks feel the heat first.
High beta stocks show a 0.25 deeper drawdown, which is not surprising. They amplify market moves both ways.
Low-margin companies are hit the hardest, with nearly 0.37 additional downside. This suggests that profitability cushions truly help during panic phases.

While the magnitudes differ slightly across estimators, the pattern is clear. Fragile fundamentals amplify stress.

And that’s exactly the kind of causal evidence I wanted to see. Not just which stocks correlate with volatility, but which ones actually cause higher risk exposure when the market turns.

Visualizing and Validating the Effects

Once I had the causal estimates, I wanted to see what they actually looked like. Numbers can tell you the “what,” but visuals often reveal the “how.”

At this point, the objective wasn’t to add more math, but to see whether the intuition from the earlier steps holds up when plotted.

This stage also acts as a sanity check. If the visual patterns contradict the statistical results, something’s usually off in the setup. But if both align, you can be more confident that the model’s logic is sound.

Plotting the Causal Effects

The first visualization compares the estimated effects from the two causal models, IPW and DR.

This helps check whether both estimators point in the same direction and how strongly each treatment affects the drawdown.


effects_plot = effects_df.melt(id_vars='treatment', value_vars=['ate_ipw','ate_dr'], var_name='method', value_name='ate')

plt.figure(figsize=(6,4))
for m in effects_plot['method'].unique():
    subset = effects_plot[effects_plot['method']==m]
    plt.bar(subset['treatment'], subset['ate'], alpha=0.7, label=m)

plt.axhline(0, color='black', linewidth=0.8)
plt.ylabel('Estimated Causal Effect on Max Drawdown')
plt.title('Causal Effect Estimates (IPW vs DR)')
plt.legend()
plt.tight_layout()
plt.show()

The bar chart above captures the story behind all those computations.

Each bar represents the estimated causal effect of a financial trait on maximum drawdown.

Across the three treatments, the trend is consistent. High PE, high beta, and low margin all deepen downside risk, with low-margin stocks showing the sharpest impact.

The difference between the blue and orange bars (IPW and DR) is minor, which is exactly what you want to see. It means both models are converging toward the same conclusion.

The takeaway is clear: fundamental fragility doesn’t just coincide with volatility. It drives it.

Checking Overlap Between Treated and Control Groups

Before trusting any causal estimate, it’s essential to check whether the treated and control samples are comparable.

In theory, causal inference assumes overlap, meaning every treated stock has a comparable control stock with similar characteristics, except for the treatment itself.

If there’s no overlap, the results become meaningless because we’re comparing two entirely different worlds.

Propensity Score Distribution

To test this, I plotted the propensity scores for each treatment.

The score represents the probability of a stock being “treated” (for instance, having a high PE ratio) given its market cap and sector.


for t in ['high_pe','high_beta','low_margin']:
    df = model_df.dropna(subset=[t,'log_mcap','sector']).copy()
    X = pd.get_dummies(df[['log_mcap','sector']], drop_first=True)
    T = df[t]

    ps_model = LogisticRegression(max_iter=500)
    ps_model.fit(X, T)
    ps = ps_model.predict_proba(X)[:,1]

    plt.figure(figsize=(6,4))
    plt.hist(ps[T==1], bins=20, alpha=0.6, label='treated', color='tab:blue')
    plt.hist(ps[T==0], bins=20, alpha=0.6, label='control', color='tab:orange')
    plt.title(f'Propensity Score Overlap — {t}')
    plt.xlabel('propensity score')
    plt.ylabel('count')
    plt.legend()
    plt.tight_layout()
    plt.show()

The histograms above show how well the model managed to balance the treated and control groups across the three traits.

For high PE stocks, there’s a healthy overlap between blue and orange bars, which means the weighting process has done its job fairly well.
For high beta, the distributions are more scattered, hinting that these stocks behave differently enough to make perfect matching difficult.
Low margin companies show the least overlap, which is expected since unprofitable firms often cluster in specific sectors like biotech or early-stage tech.

While the overlaps aren’t perfect, they’re sufficient to move forward. The key point is that there’s enough shared ground between treated and control groups to make causal comparison meaningful.

Verifying Covariate Balance

Even though the visual overlaps looked decent, I wanted a numerical check to confirm that the weighting step did what it was supposed to.

The idea is simple: after reweighting, the treated and control groups should look nearly identical in their key characteristics, except for the treatment itself.

To test this, I compared the mean difference in log market capitalization between high-beta stocks and their controls, both before and after weighting.


t = 'high_beta'
df = model_df.dropna(subset=[t,'log_mcap','sector']).copy()
X = pd.get_dummies(df[['log_mcap','sector']], drop_first=True)
T = df[t]

ps_model = LogisticRegression(max_iter=500)
ps_model.fit(X, T)
ps = ps_model.predict_proba(X)[:,1]
w = T/ps + (1-T)/(1-ps)

m_treat_raw = df.loc[T==1,'log_mcap'].mean()
m_ctrl_raw = df.loc[T==0,'log_mcap'].mean()
m_treat_w = np.average(df['log_mcap'], weights=T*w)
m_ctrl_w = np.average(df['log_mcap'], weights=(1-T)*w)

print('Covariate balance for log_mcap (high_beta)')
print(f'Raw diff: {m_treat_raw - m_ctrl_raw:.4f}')
print(f'Weighted diff: {m_treat_w - m_ctrl_w:.4f}')

Before weighting, the average difference in log market cap between high-beta and low-beta stocks was around 0.19, which means the two groups weren’t perfectly comparable.

After applying the weights, that difference dropped to 0.0008 (nearly zero).

This confirms that the balancing process worked as intended. By reweighting observations according to their propensity scores, the treated and control groups now share almost identical distributions in their control variables.

That alignment is what gives the causal estimates credibility. Without it, we would simply be comparing random subsets of the market instead of isolating the effect of specific traits like valuation, volatility, or profitability.

What This Means for Risk Management

After spending hours inside Python, it’s easy to forget what this all connects to, which is the real market. The value of causal modeling is that it reframes how we think about risk. Instead of assuming or guessing which factors drive volatility, it gives us a measurable way to quantify how much each one truly contributes.

In traditional analysis, drawdowns are treated as outcomes of randomness or sentiment. But once we look through a causal lens, the randomness begins to fade. We start to see structure and the patterns that repeat when certain traits combine.

A portfolio manager can use this insight to design more resilient allocations. For instance, if low-margin firms consistently amplify downside during stress windows, those positions can be hedged or sized down ahead of similar conditions. If high-beta stocks display asymmetric losses even after accounting for size and sector, then they deserve a risk premium or tighter stop rules.

The same principle applies to stress testing.

Rather than simulating arbitrary shocks, we can construct counterfactual scenarios:

What would have happened if the portfolio held fewer high-PE names during a sell-off?

How much of the observed risk was structural, and how much was avoidable?

Causal models turn these questions into measurable answers. They allow risk managers to move from intuition to evidence, from correlation to attribution.

Closing Thoughts

This entire workflow, from raw data to counterfactual reasoning, was never meant to predict the next crash or rally.

It was meant to understand why portfolios behave the way they do when markets turn unstable.

By combining fundamental traits, stress windows, and causal estimators, we built a framework that explains structural drivers of risk rather than chasing signals.

Each piece of code served a purpose: defining clean treatments, balancing controls, estimating effects, and validating the logic through overlap and weighting.

Causal inference will not replace forecasting or technical analysis. But it adds a new dimension. The one focused on understanding rather than prediction. When markets fall, it helps separate what was inevitable from what was amplified by exposure to specific traits.

Markets don’t just move randomly. They react to structure, and causal AI helps us see that structure clearly.

tech. finance. ai