From Signals to Verdicts: Building a Financial Research Copilot with MCP and Python
- Nikhil Adithyan

- 4 days ago
- 18 min read
Updated: 21 hours ago
Part 2: building the reasoning layer, verdict engine, and final research memo

In the first part, we built the foundation of the copilot. We set up the MCP client, parsed a natural-language thesis, pulled historical prices and fundamentals, and converted both into structured signal layers. At that point, the system could already understand a research prompt and gather the data needed to evaluate it.
But signals alone are not enough. A useful research copilot should be able to look at those signals and answer a harder question. What actually supports the thesis, what weakens it, and what is still missing before a judgment can be made?
That is what this second part is about. We will take the price and fundamentals signals from Part 1, turn them into supporting and weakening evidence, assign a verdict, and then generate the final research memo. This is the point where the project stops looking like a data pipeline and starts behaving like an actual thesis-testing system.
Where Part 1 Left Us
By the end of Part 1, the copilot could already handle the setup work needed before any real reasoning begins.
It could:
take a natural-language thesis and turn it into a structured request
pull historical prices and fundamentals through EODHD’s MCP layer
convert those inputs into reusable signal layers
So we are not starting this part from raw API responses. We already have two clean inputs to work with. One is the market side, built from price-based signals like return, volatility, drawdown, trend, and return-to-volatility. The other is the business side, built from fundamentals signals like margins, returns on capital, growth, valuation, revisions, and beta.
That changes the job in Part 2. We no longer need to worry about setup or data extraction. The task now is to take those signals and turn them into something more useful: supporting evidence, weakening evidence, missing evidence, a verdict, and the final research memo.
Classifying the Thesis
Before the copilot can judge a thesis, it first needs to understand what kind of claim is being made. That matters because not every thesis should be tested the same way. A claim about controlled downside should care more about drawdown and volatility. A claim about business quality should lean more on margins, returns on capital, and growth. A claim about premium justification may need both business quality and valuation context.
So instead of jumping straight from signals to a verdict, we add a small classification step. This gives the system a short list of claim types to work with and a cleaner summary of the thesis.
Add this to core.py:
def classify_thesis(thesis):
prompt = f"""
You are classifying a stock thesis into a few broad claim types.
Return only valid JSON like this:
{{
"claim_types": ["controlled_downside", "business_quality"],
"summary": "short restatement of the thesis"
}}
Allowed claim types:
- controlled_downside
- momentum_strength
- low_risk
- high_risk
- valuation_attractive
- valuation_expensive
- business_quality
- weak_business_quality
- premium_justified
- premium_not_justified
Rules:
- pick only the claim types that are clearly relevant
- do not invent extra labels
- if nothing fits strongly, return an empty list
- summary should be short and faithful
Thesis:
{thesis}
""".strip()
r = oa.responses.create(
model=model_name,
input=[{"role": "user", "content": prompt}],
)
raw = r.output_text.strip()
try:
out = json.loads(raw)
except Exception:
raise RuntimeError(f"thesis classifier returned non-json text: {raw[:500]}")
claim_types = out.get("claim_types", [])
if not isinstance(claim_types, list):
claim_types = []
clean = []
allowed = {
"controlled_downside",
"momentum_strength",
"low_risk",
"high_risk",
"valuation_attractive",
"valuation_expensive",
"business_quality",
"weak_business_quality",
"premium_justified",
"premium_not_justified",
}
for x in claim_types:
x = str(x).strip()
if x in allowed and x not in clean:
clean.append(x)
return {
"claim_types": clean,
"summary": str(out.get("summary", "")).strip(),
}
This function keeps the model’s job narrow. It is not being asked to decide whether the thesis is right or wrong. It is only being asked to identify the kind of thesis it is dealing with. That makes the next step much cleaner, because the evidence engine no longer has to treat every prompt the same way.
The validation at the bottom is important too. Even though the model returns the labels, Python still filters them through an allowed set and removes anything unexpected. That keeps this step flexible, but still controlled.
Turning Signals into Support, Contradiction, and Missing Evidence
This is the step where the copilot actually starts reasoning.
Up to this point, we have three things in hand. We have the thesis, we have the claim types, and we have the signal layers built from price data and fundamentals. But none of that is useful on its own unless the system can turn it into a clear argument. That means it needs to answer three questions for every thesis:
What in the data supports this claim?
What in the data weakens it?
What is still missing before we can judge it properly?
That is exactly what build_evidence_blocks() does. It takes the classified thesis, checks the relevant price and fundamentals signals, and sorts them into three buckets. support, contradiction, and missing evidence.
Add this to core.py:
def build_evidence_blocks(thesis, thesis_tags, price_signals, fundamental_signals):
evidence_for = []
evidence_against = []
missing_evidence = []
ret_total = price_signals.get("ret_total")
vol = price_signals.get("vol_annualized")
dd = price_signals.get("max_drawdown")
trend = price_signals.get("trend_slope")
ret_to_vol = price_signals.get("ret_to_vol")
pe = fundamental_signals.get("pe_ratio") or fundamental_signals.get("trailing_pe")
forward_pe = fundamental_signals.get("forward_pe")
beta = fundamental_signals.get("beta")
profit_margin = fundamental_signals.get("profit_margin")
operating_margin = fundamental_signals.get("operating_margin")
roa = fundamental_signals.get("roa")
roe = fundamental_signals.get("roe")
revenue_growth = fundamental_signals.get("revenue_growth_yoy")
earnings_growth = fundamental_signals.get("earnings_growth_yoy")
earnings_estimate_growth = fundamental_signals.get("earnings_estimate_growth")
revenue_estimate_growth = fundamental_signals.get("revenue_estimate_growth")
net_eps_revisions = fundamental_signals.get("net_eps_revisions_30d")
claim_types = thesis_tags.get("claim_types", [])
if "controlled_downside" in claim_types:
if dd is not None:
if dd > -0.15:
evidence_for.append(f"Maximum drawdown was relatively contained at {dd:.2%}.")
else:
evidence_against.append(f"Maximum drawdown reached {dd:.2%}, which weakens the controlled-downside claim.")
else:
missing_evidence.append("No drawdown signal available to test downside control.")
if "momentum_strength" in claim_types:
if trend is not None and ret_total is not None:
if trend > 0 and ret_total > 0:
evidence_for.append(f"Trend was positive and total return over the window was {ret_total:.2%}.")
else:
evidence_against.append("Trend and total return do not strongly support a momentum-strength view.")
else:
missing_evidence.append("No usable trend or return signal available to test momentum.")
if "low_risk" in claim_types:
if vol is not None:
if vol < 0.30:
evidence_for.append(f"Annualized volatility was {vol:.2%}, which supports a lower-risk view.")
else:
evidence_against.append(f"Annualized volatility was {vol:.2%}, which weakens a low-risk thesis.")
else:
missing_evidence.append("No volatility signal available to test risk.")
if "high_risk" in claim_types:
if vol is not None:
if vol >= 0.30:
evidence_for.append(f"Annualized volatility was {vol:.2%}, which supports a higher-risk view.")
else:
evidence_against.append(f"Annualized volatility was only {vol:.2%}, which does not strongly support a high-risk thesis.")
else:
missing_evidence.append("No volatility signal available to test risk.")
if "valuation_attractive" in claim_types:
if pe is not None:
if pe < 20:
evidence_for.append(f"P/E is {pe:.2f}, which supports a more attractive valuation view.")
elif pe > 30:
evidence_against.append(f"P/E is {pe:.2f}, which weakens the attractive-valuation claim.")
else:
missing_evidence.append("No P/E metric available to test valuation attractiveness.")
if forward_pe is not None and pe is not None:
if forward_pe < pe:
evidence_for.append(f"Forward P/E ({forward_pe:.2f}) is below trailing P/E ({pe:.2f}), which can support an improving earnings setup.")
if "valuation_expensive" in claim_types or "premium_not_justified" in claim_types:
if pe is not None:
if pe > 30:
evidence_for.append(f"P/E is {pe:.2f}, which supports an expensive-valuation view.")
else:
evidence_against.append(f"P/E is {pe:.2f}, which does not strongly support an expensive-valuation claim.")
else:
missing_evidence.append("No P/E metric available to test whether valuation looks expensive.")
if "business_quality" in claim_types or "premium_justified" in claim_types:
quality_hits = 0
if operating_margin is not None:
if operating_margin >= 0.25:
evidence_for.append(f"Operating margin is {operating_margin:.2%}, which supports strong business quality.")
quality_hits += 1
else:
evidence_against.append(f"Operating margin is {operating_margin:.2%}, which is not especially strong for a quality claim.")
if profit_margin is not None:
if profit_margin >= 0.20:
evidence_for.append(f"Profit margin is {profit_margin:.2%}, which supports business quality.")
quality_hits += 1
else:
evidence_against.append(f"Profit margin is {profit_margin:.2%}, which weakens a strong-quality thesis.")
if roa is not None:
if roa >= 0.10:
evidence_for.append(f"ROA is {roa:.2%}, which supports efficient asset use.")
quality_hits += 1
else:
evidence_against.append(f"ROA is {roa:.2%}, which does not strongly support a quality claim.")
if roe is not None:
if roe >= 0.20:
evidence_for.append(f"ROE is {roe:.2%}, which supports strong capital efficiency.")
quality_hits += 1
else:
evidence_against.append(f"ROE is {roe:.2%}, which is weaker than expected for a strong-quality thesis.")
if revenue_growth is not None:
if revenue_growth > 0:
evidence_for.append(f"Quarterly revenue growth was {revenue_growth:.2%} YoY, which supports business momentum.")
quality_hits += 1
else:
evidence_against.append(f"Quarterly revenue growth was {revenue_growth:.2%} YoY, which weakens the quality claim.")
if earnings_growth is not None:
if earnings_growth > 0:
evidence_for.append(f"Quarterly earnings growth was {earnings_growth:.2%} YoY, which supports operating strength.")
quality_hits += 1
else:
evidence_against.append(f"Quarterly earnings growth was {earnings_growth:.2%} YoY, which weakens the quality claim.")
if earnings_estimate_growth is not None:
if earnings_estimate_growth > 0:
evidence_for.append(f"Forward earnings estimate growth is {earnings_estimate_growth:.2%}, which supports a healthier forward outlook.")
else:
evidence_against.append(f"Forward earnings estimate growth is {earnings_estimate_growth:.2%}, which weakens the quality argument.")
if revenue_estimate_growth is not None:
if revenue_estimate_growth > 0:
evidence_for.append(f"Forward revenue estimate growth is {revenue_estimate_growth:.2%}, which supports ongoing business strength.")
else:
evidence_against.append(f"Forward revenue estimate growth is {revenue_estimate_growth:.2%}, which weakens the quality argument.")
if net_eps_revisions is not None:
if net_eps_revisions > 0:
evidence_for.append(f"Net EPS revisions over the last 30 days are positive ({net_eps_revisions:.0f}), which supports improving expectations.")
elif net_eps_revisions < 0:
evidence_against.append(f"Net EPS revisions over the last 30 days are negative ({net_eps_revisions:.0f}), which weakens the thesis.")
if quality_hits == 0:
missing_evidence.append("This version could not extract enough direct business-quality metrics to test the quality claim.")
if "weak_business_quality" in claim_types:
if operating_margin is not None and operating_margin < 0.15:
evidence_for.append(f"Operating margin is only {operating_margin:.2%}, which supports a weaker-quality view.")
if profit_margin is not None and profit_margin < 0.10:
evidence_for.append(f"Profit margin is only {profit_margin:.2%}, which supports a weaker-quality view.")
if revenue_growth is not None and revenue_growth <= 0:
evidence_for.append(f"Revenue growth is {revenue_growth:.2%} YoY, which supports a weaker-quality view.")
if earnings_growth is not None and earnings_growth <= 0:
evidence_for.append(f"Earnings growth is {earnings_growth:.2%} YoY, which supports a weaker-quality view.")
if beta is not None:
if beta > 1.2:
evidence_against.append(f"Beta is {beta:.2f}, which suggests above-market sensitivity.")
elif beta < 0.9:
evidence_for.append(f"Beta is {beta:.2f}, which suggests below-market sensitivity.")
else:
missing_evidence.append("No beta value available.")
if ret_to_vol is None:
missing_evidence.append("No return-to-volatility signal available.")
if not evidence_for and not evidence_against:
missing_evidence.append("The current data is not enough to strongly support or reject the thesis.")
return {
"thesis": thesis,
"thesis_summary": thesis_tags.get("summary", ""),
"claim_types": claim_types,
"evidence_for": evidence_for,
"evidence_against": evidence_against,
"missing_evidence": list(dict.fromkeys(missing_evidence)),
}
The function looks long, but the logic is simple once you break it down.
It starts by pulling the signals it needs from the two evidence layers we already built in Part 1. Then it checks the thesis tags one by one. If the thesis is about controlled downside, it looks at drawdown. If it is about risk, it looks at volatility and beta. If it is about business quality, it leans on margins, returns on capital, growth, and revisions. If it is about valuation, it checks multiples like P/E and the relationship between forward and trailing valuation.
That is the key shift in this project. The copilot is no longer just collecting data. It is deciding which parts of the EODHD-backed signal set actually matter for the thesis in front of it.
The three output buckets are what make this useful.
evidence_for holds the points that support the claim.
evidence_against holds the points that weaken it.
missing_evidence makes the gaps explicit instead of letting the system sound more confident than it should.
That is what makes this feel like a thesis-testing workflow rather than a polished stock summary.
Sanity Check (Jupyter Notebook)
Run this code inside test.ipynb for a quick sanity check:
import uuid
from core import (
fetch_prices,
fetch_fundamentals,
compute_price_signals,
classify_thesis,
build_evidence_blocks,
make_state
)
import json
trace_id = uuid.uuid4().hex[:10]
state = make_state()
thesis = "Apple looks attractive because downside has been controlled and business quality remains high."
prices = await fetch_prices("AAPL.US", "2026-01-01", "2026-04-01", trace_id, state)
funds = await fetch_fundamentals("AAPL.US", trace_id, state)
signals = compute_price_signals(prices)
tags = classify_thesis(thesis)
evidence = build_evidence_blocks(thesis, tags, signals, funds)
print(tags)
print(json.dumps(evidence, indent=2))
Expected Output:

Assigning a Verdict
Once the evidence is structured, the copilot still needs one more layer before it can write a memo. It needs a controlled way to label the thesis.
That is the job of decide_verdict(). It looks at how much evidence supports the thesis, how much weakens it, and whether the claim still depends on missing business-quality or valuation evidence. The goal here is not to create a perfect scoring model. It is to make sure the system does not jump from a few evidence strings straight into a confident conclusion.
Add this to core.py:
def decide_verdict(evidence, claim_types=None):
claim_types = claim_types or []
evidence_for = evidence.get("evidence_for", [])
evidence_against = evidence.get("evidence_against", [])
missing = evidence.get("missing_evidence", [])
n_for = len(evidence_for)
n_against = len(evidence_against)
n_missing = len(missing)
quality_claim = any(x in claim_types for x in ["business_quality", "weak_business_quality", "premium_justified", "premium_not_justified"])
valuation_claim = any(x in claim_types for x in ["valuation_attractive", "valuation_expensive", "premium_justified", "premium_not_justified"])
if n_for == 0 and n_against == 0:
return {
"verdict": "unresolved_due_to_missing_evidence",
"reason": "There is not enough usable evidence to test the thesis.",
}
if quality_claim and n_missing >= 1:
if n_against > 0:
return {
"verdict": "weakly_supported",
"reason": "Some evidence supports the thesis, but direct business-quality evidence is missing and contradictory signals remain.",
}
return {
"verdict": "partially_supported",
"reason": "Part of the thesis is supported, but direct business-quality evidence is missing.",
}
if valuation_claim and n_missing >= 1:
return {
"verdict": "unresolved_due_to_missing_evidence",
"reason": "The thesis depends on valuation evidence that is not available in this version.",
}
if n_for > 0 and n_against == 0:
if n_missing >= 2:
return {
"verdict": "partially_supported",
"reason": "The available evidence supports the thesis, but important evidence is still missing.",
}
return {
"verdict": "supported",
"reason": "The available evidence mainly supports the thesis.",
}
if n_against > 0 and n_for == 0:
return {
"verdict": "not_supported",
"reason": "The available evidence mainly weakens the thesis.",
}
if n_for > n_against:
return {
"verdict": "partially_supported",
"reason": "There is more supporting evidence than contradicting evidence, but the thesis is not fully confirmed.",
}
if n_against >= n_for:
return {
"verdict": "weakly_supported",
"reason": "Contradicting evidence is meaningful enough that the thesis is only weakly supported.",
}
return {
"verdict": "unresolved_due_to_missing_evidence",
"reason": "The evidence is mixed and does not clearly resolve the thesis.",
}
The logic here is intentionally simple. It does not try to do fine-grained scoring. Instead, it uses the shape of the evidence to decide whether the thesis is supported, partially supported, weakly supported, not supported, or still unresolved.
A couple of checks matter more than the rest. If the thesis depends on business-quality or valuation evidence and that evidence is still missing, the verdict gets capped early instead of sounding stronger than it should. That is important because a thesis can look convincing on price behavior alone, but still be incomplete if the claim depends on fundamentals that are not actually present.
The other useful thing about this function is that it returns both a short label and a reason. That makes the final output easier to understand later, and it also gives the memo-writing step something cleaner to work from than a bare category.
Building the Facts Object
Before the memo gets written, the system first puts everything into one structured object. That object becomes the single source of truth for the final output. Instead of handing the model a mix of scattered variables, we give it one clean package containing the thesis, signals, company context, evidence, and verdict.
1. Company Context
We’ll start with a small helper that pulls the basic company context from the fundamentals payload.
Add this to core.py:
def extract_company_context(fundamentals):
if not isinstance(fundamentals, dict):
return {}
gen = fundamentals.get("General", {}) or {}
out = {
"name": gen.get("Name"),
"code": gen.get("Code"),
"exchange": gen.get("Exchange"),
"sector": gen.get("Sector"),
"industry": gen.get("Industry"),
"country": gen.get("CountryName"),
"market_cap": gen.get("MarketCapitalization"),
"pe_ratio": gen.get("PERatio"),
"beta": gen.get("Beta"),
"dividend_yield": gen.get("DividendYield"),
"description": gen.get("Description"),
}
clean = {}
for k, v in out.items():
if v not in (None, "", "NA"):
clean[k] = v
return clean
This function is just a cleanup step. It gives us a compact company context block that can later sit alongside the price and fundamentals signals without dragging the full fundamentals payload into the memo layer.
2. Single-Stock Facts Builder
Now add the single-stock facts builder:
def build_thesis_facts(parsed, ticker, signals, fundamentals, thesis_tags, evidence):
company = extract_company_context(fundamentals)
facts = {
"type": "single_name_thesis_test",
"ticker": ticker,
"lookback_days": parsed["lookback_days"],
"thesis": parsed["thesis"],
"thesis_summary": thesis_tags.get("summary", ""),
"claim_types": thesis_tags.get("claim_types", []),
"market_signals": {
"ret_total": signals.get("ret_total"),
"vol_annualized": signals.get("vol_annualized"),
"max_drawdown": signals.get("max_drawdown"),
"trend_slope": signals.get("trend_slope"),
"ret_to_vol": signals.get("ret_to_vol"),
"start_price": signals.get("start_price"),
"end_price": signals.get("end_price"),
"n_points": signals.get("n_points"),
},
"company_context": {
"name": company.get("name"),
"exchange": company.get("exchange"),
"sector": company.get("sector"),
"industry": company.get("industry"),
"country": company.get("country"),
"market_cap": company.get("market_cap"),
"pe_ratio": company.get("pe_ratio"),
"beta": company.get("beta"),
"dividend_yield": company.get("dividend_yield"),
},
"description": company.get("description"),
"evidence_for": evidence.get("evidence_for", []),
"evidence_against": evidence.get("evidence_against", []),
"missing_evidence": evidence.get("missing_evidence", []),
}
facts["verdict"] = decide_verdict(evidence, thesis_tags.get("claim_types", []))
return facts
This is the main facts object for a single-stock thesis. It pulls together the parsed thesis, the market signals, the basic company context, the evidence buckets, and the verdict. At this point, the copilot has already done the reasoning work. The memo is not deciding anything new. It is just writing from this object.
3. Watchlist Facts Builder
Now add the watchlist version:
def build_watchlist_facts(parsed, tickers, signals_by_ticker, fundamentals_by_ticker, thesis_tags, evidence_by_ticker):
per_ticker = {}
for t in tickers:
company = extract_company_context(fundamentals_by_ticker.get(t, {}))
signals = signals_by_ticker.get(t, {})
evidence = evidence_by_ticker.get(t, {})
per_ticker[t] = {
"company_context": {
"name": company.get("name"),
"sector": company.get("sector"),
"industry": company.get("industry"),
"market_cap": company.get("market_cap"),
"pe_ratio": company.get("pe_ratio"),
"beta": company.get("beta"),
},
"market_signals": {
"ret_total": signals.get("ret_total"),
"vol_annualized": signals.get("vol_annualized"),
"max_drawdown": signals.get("max_drawdown"),
"trend_slope": signals.get("trend_slope"),
"ret_to_vol": signals.get("ret_to_vol"),
},
"evidence_for": evidence.get("evidence_for", []),
"evidence_against": evidence.get("evidence_against", []),
"missing_evidence": evidence.get("missing_evidence", []),
"verdict": decide_verdict(evidence, thesis_tags.get("claim_types", []))
}
facts = {
"type": "watchlist_thesis_test",
"tickers": tickers,
"lookback_days": parsed["lookback_days"],
"thesis": parsed["thesis"],
"thesis_summary": thesis_tags.get("summary", ""),
"claim_types": thesis_tags.get("claim_types", []),
"per_ticker": per_ticker,
}
return facts
This version does the same thing, but across multiple tickers. Instead of one top-level evidence block, it stores a per-ticker structure so the memo layer can later compare names without needing to reconstruct anything.
That is the main reason this section matters. By the time we reach the memo step, we no longer want to pass loose values around. We want one structured object that already contains:
the thesis
the relevant signals
the company context
the evidence buckets
the verdict
That keeps the final writing step much cleaner and makes the whole workflow easier to debug.
Sanity Check (Jupyter Notebook)
Run this code inside test.ipynb for a quick sanity check:
from core import build_thesis_facts, extract_company_context
facts = build_thesis_facts(
parsed={
"tickers": ["AAPL"],
"lookback_days": 180,
"thesis": "Apple looks attractive because downside has been controlled and business quality remains high.",
"mode": "single"
},
ticker="AAPL.US",
signals=signals,
fundamentals=funds,
thesis_tags=tags,
evidence=evidence
)
print(json.dumps(facts, indent=2))
Expected Output:
{
"type": "single_name_thesis_test",
"ticker": "AAPL.US",
"lookback_days": 180,
"thesis": "Apple looks attractive because downside has been controlled and business quality remains high.",
"thesis_summary": "Apple is attractive due to controlled downside and strong business quality",
"claim_types": [
"controlled_downside",
"business_quality"
],
"market_signals": {
"ret_total": -0.05675067340688533,
"vol_annualized": 0.2504818805125429,
"max_drawdown": -0.11322450740687473,
"trend_slope": -0.0005437843809243782,
"ret_to_vol": -0.22656598270006817,
"start_price": 271.01,
"end_price": 255.63,
"n_points": 62
},
"company_context": {
"name": "Apple Inc",
"exchange": "NASDAQ",
"sector": "Technology",
"industry": "Consumer Electronics",
"country": "USA",
"market_cap": null,
"pe_ratio": null,
"beta": null,
"dividend_yield": null
},
"description": "Apple Inc. designs, manufactures, and markets smartphones, personal computers, tablets, wearables, and accessories worldwide. The company offers iPhone, a line of smartphones; Mac, a line of personal computers; iPad, a line of multi-purpose tablets; and wearables, home, and accessories comprising AirPods, Apple Vision Pro, Apple TV, Apple Watch, Beats products, and HomePod, as well as Apple branded and third-party accessories. It also provides AppleCare support and cloud services; and operates various platforms, including the App Store that allow customers to discover and download applications and digital content, such as books, music, video, games, and podcasts, as well as advertising services include third-party licensing arrangements and its own advertising platforms. In addition, the company offers various subscription-based services, such as Apple Arcade, a game subscription service; Apple Fitness+, a personalized fitness service; Apple Music, which offers users a curated listening experience with on-demand radio stations; Apple News+, a subscription news and magazine service; Apple TV, which offers exclusive original content and live sports; Apple Card, a co-branded credit card; and Apple Pay, a cashless payment service, as well as licenses its intellectual property. The company serves consumers, and small and mid-sized businesses; and the education, enterprise, and government markets. It distributes third-party applications for its products through the App Store. The company also sells its products through its retail and online stores, and direct sales force; and third-party cellular network carriers and resellers. The company was formerly known as Apple Computer, Inc. and changed its name to Apple Inc. in January 2007. Apple Inc. was founded in 1976 and is headquartered in Cupertino, California.",
"evidence_for": [
"Maximum drawdown was relatively contained at -11.32%."
],
"evidence_against": [],
"missing_evidence": [
"This version does not include direct business-quality metrics such as margins, growth, cash flow, or return on capital.",
"Only basic company context is available, which is not enough on its own to confirm business quality.",
"No beta value available."
],
"verdict": {
"verdict": "partially_supported",
"reason": "Part of the thesis is supported, but direct business-quality evidence is missing."
}
}
Writing the Final Memo
At this point, the hard part is already done.
By the time we reach the memo step, the copilot already has a structured facts object with the thesis, claim types, market signals, company context, evidence buckets, and verdict. So this final function is not where the reasoning happens. It is just the presentation layer that turns that structured judgment into something readable.
Add this to core.py:
def write_thesis_memo(facts):
prompt = f"""
You are writing a short financial research memo.
Write using only the facts provided below.
Do not invent numbers, events, comparisons, or opinions beyond the supplied evidence.
If evidence is missing, say so clearly.
Use this exact structure:
1. Thesis under review
2. Supporting evidence
3. Evidence that weakens the thesis
4. Missing evidence
5. Verdict
6. Bottom-line assessment
Style rules:
- Keep it concise
- Keep it analytical and professional
- No bullet points unless necessary
- No hype
- No generic investment disclaimer language
- The bottom-line assessment should be balanced and evidence-based
- The verdict section must explicitly use the supplied verdict
Facts:
{json.dumps(facts, indent=2, default=str)}
""".strip()
r = oa.responses.create(
model=model_name,
input=[{"role": "user", "content": prompt}],
)
return r.output_text.strip()
This function keeps the model boxed into one narrow task. It is not being asked to look at raw price history, raw fundamentals, or scattered variables. It is being asked to write from one clean facts object that already contains the judgment. That separation matters because it keeps the final memo grounded. The model is not deciding what it thinks about the stock at the last second. It is simply turning the structured output of the earlier steps into a short research note.
The prompt is also deliberately strict. It fixes the memo structure, tells the model not to invent anything, and makes the verdict explicit instead of leaving it implied. That helps the final output stay consistent even when the underlying thesis changes.
Sanity Check (Jupyter Notebook)
You can test it with a facts object from the previous section:
from core import write_thesis_memo
memo = write_thesis_memo(facts)
print(memo)
Expected Output:

Stitching Everything Together
At this point, all the individual pieces are ready. We have the parser, the data fetchers, the signal builders, the thesis classifier, the evidence engine, the verdict layer, and the memo writer. The only thing left is to connect them into one end-to-end function.
Add this to core.py:
async def run_thesis_copilot(user_text):
trace_id = uuid.uuid4().hex[:10]
log_event("request_started", trace_id, text=user_text)
parsed = enforce_limits(parse_request(user_text))
tickers = parsed["tickers"]
if not tickers:
return {
"memo": "No valid ticker was found in the request.",
"facts": {},
"data_used": {},
"tool_trace_id": trace_id,
}
log_event(
"parsed",
trace_id,
tickers=tickers,
lookback_days=parsed["lookback_days"],
mode=parsed["mode"],
thesis=parsed["thesis"],
)
start_date, end_date = get_dates_from_lookback(parsed["lookback_days"])
state = make_state()
try:
thesis_tags = classify_thesis(parsed["thesis"])
if parsed["mode"] == "single":
ticker = tickers[0]
ticker_full = ticker if "." in ticker else f"{ticker}.US"
log_event(
"tool_phase",
trace_id,
mode="single",
ticker=ticker_full,
start_date=start_date,
end_date=end_date,
)
prices = await fetch_prices(ticker_full, start_date, end_date, trace_id, state)
funds = await fetch_fundamentals(ticker_full, trace_id, state)
price_signals = compute_price_signals(prices)
fundamental_signals = compute_fundamental_signals(funds)
evidence = build_evidence_blocks(
parsed["thesis"],
thesis_tags,
price_signals,
fundamental_signals
)
facts = build_thesis_facts(
parsed,
ticker_full,
price_signals,
funds,
thesis_tags,
evidence
)
facts["fundamental_signals"] = fundamental_signals
memo = write_thesis_memo(facts)
out = {
"memo": memo,
"facts": facts,
"data_used": {
"tickers": [ticker_full],
"date_range": [start_date, end_date],
"tools_called": [x.get("tool") for x in state["tool_trace"]],
"tool_calls": state["tool_calls"],
},
"tool_trace_id": trace_id,
}
log_event("request_finished", trace_id, tool_calls=state["tool_calls"])
return out
ticker_full = [x if "." in x else f"{x}.US" for x in tickers]
log_event(
"tool_phase",
trace_id,
mode="watchlist",
tickers=ticker_full,
start_date=start_date,
end_date=end_date,
)
signals_by_ticker = {}
funds_by_ticker = {}
evidence_by_ticker = {}
for t in ticker_full:
prices = await fetch_prices(t, start_date, end_date, trace_id, state)
funds = await fetch_fundamentals(t, trace_id, state)
price_signals = compute_price_signals(prices)
fundamental_signals = compute_fundamental_signals(funds)
evidence = build_evidence_blocks(
parsed["thesis"],
thesis_tags,
price_signals,
fundamental_signals
)
signals_by_ticker[t] = {
**price_signals,
"fundamental_signals": fundamental_signals
}
funds_by_ticker[t] = funds
evidence_by_ticker[t] = evidence
facts = build_watchlist_facts(
parsed,
ticker_full,
signals_by_ticker,
funds_by_ticker,
thesis_tags,
evidence_by_ticker,
)
memo = write_thesis_memo(facts)
out = {
"memo": memo,
"facts": facts,
"data_used": {
"tickers": ticker_full,
"date_range": [start_date, end_date],
"tools_called": [x.get("tool") for x in state["tool_trace"]],
"tool_calls": state["tool_calls"],
},
"tool_trace_id": trace_id,
}
log_event("request_finished", trace_id, tool_calls=state["tool_calls"])
return out
except Exception as e:
detail = repr(e)
if hasattr(e, "exceptions"):
detail = detail + " | " + " ; ".join([repr(x) for x in e.exceptions])
log_event("request_failed", trace_id, err=detail)
return {
"memo": f"failed: {e}",
"facts": {},
"data_used": {
"tickers": tickers,
"date_range": [start_date, end_date],
"tools_called": [x.get("tool") for x in state["tool_trace"]],
"tool_calls": state["tool_calls"],
},
"tool_trace_id": trace_id,
}
This function is just the full workflow in one place. It parses the request, fetches the data, computes the two signal layers, builds the evidence, assembles the facts object, writes the memo, and returns everything in a clean output.
The useful part is that it returns more than just the memo. It also returns the structured facts object, the tools that were used, the date range, and the trace ID. That keeps the final result inspectable instead of turning the copilot into a black box.
Demo Time! (Jupyter Notebook)
Demo 1. Testing Whether a Premium Is Actually Justified
This is a good first demo because it pushes the copilot beyond a basic single-stock check. The prompt is not asking whether NVIDIA is a good company in general. It is asking whether NVIDIA’s premium over AMD can actually be defended using market behavior and business quality.
Here is the prompt:
from core import run_thesis_copilot
q = """
Between NVDA and AMD, I think NVDA's premium is still justified by stronger market behavior and business quality.
Check that over the last 6 months.
""".strip()
result = await run_thesis_copilot(q)
print(result["memo"])
print(result["data_used"])
And here is the output:

What makes this output useful is that it does not flatten the result into a simple yes or no. NVIDIA clearly looks stronger on business quality, but market behavior is not as convincing, and the lack of direct valuation data stops the copilot from overclaiming.
That is the kind of behavior we want. The system is not just comparing two companies. It is testing whether the specific claim about a premium actually holds up.
Demo 2. Testing Whether Volatility Is Too High for the Underlying Business
The second demo shifts back to a single-stock thesis, but the claim is different. This time, the question is not whether the company looks attractive. It is whether the stock is more volatile than the underlying business quality would justify.
Here is the prompt:
q = """
TSLA feels too volatile for the underlying business quality.
Test that thesis over the last year.
""".strip()
result = await run_thesis_copilot(q)
print(result["memo"])
print(result["data_used"])
And here is the output:

This result is useful because it shows a more conflicted thesis. Tesla’s recent returns and forward growth expectations offer some support, but the current profitability, recent operating trends, revisions, and volatility profile all push back against the idea that the business quality is strong enough to fully justify that risk.
So the final verdict lands where it should: not as a clean confirmation, but as a weakly supported thesis.
Final Thoughts
At this point, the copilot already does the most important part well. It can take a natural-language thesis, pull the right market and fundamentals data through EODHD’s MCP layer, turn those inputs into structured evidence, and return a research memo that is much more disciplined than a normal stock summary.
At the same time, this version still has clear limits. It does not yet go deeper into statement-level accounting logic, it does not use news or catalyst context, and its handling of relative valuation can still be stronger for more demanding comparison cases. But even with those limits, the shift here is already meaningful. The real change was not just connecting a model to financial data. It was moving from summarizing stocks to testing claims.



Comments