Building A Market Research Copilot using MCP and Python

Nikhil Adithyan
5 days ago
13 min read

Part 1: setting up MCP, parsing the thesis, and building the data pipeline

Building A Market Research Copilot using MCP and Python

Most financial AI tools are good at one thing: summarizing a stock. You ask about Apple, NVIDIA, or Tesla, and they give you a clean overview of price action, a few ratios, and maybe some company context. That can be useful, but it falls short the moment the task becomes more like real research.

Real research usually starts with a view. Not a ticker. A trader, analyst, or product team is more likely to ask something like, “Apple looks attractive because downside has been controlled and business quality remains high. Does the data actually support that?” That is a different problem. A summary cannot answer it properly because the system needs to test the claim itself, not just describe the company around it.

In this tutorial, we are going to build a financial research copilot that does exactly that. It takes a natural-language thesis, pulls historical prices and fundamentals through EODHD’s MCP server, turns those inputs into structured evidence, and returns a short research memo with a verdict.

Note: This is a two-part tutorial. In this first part, we’ll set up the MCP client, parse a natural-language thesis, fetch historical and fundamentals data, and turn both into structured evidence layers. In Part 2, we’ll use those signals to build support and contradiction, assign a verdict, and generate the final research memo.

What This Copilot Actually Produces

Before getting into the pipeline, it helps to see the kind of output we are building toward. The easiest way to understand this project is to look at one real example.

Suppose the user gives the system this prompt:

I think Apple looks attractive because downside has been controlled and business quality remains high. Can you test that for AAPL over the last 180 days?

The copilot does not respond with a loose summary of Apple. It turns that into a structured research memo:


1. Thesis under review  

Apple appears attractive due to controlled downside and sustained high business quality.

2. Supporting evidence  

Over the past 180 days, maximum drawdown was limited to -13.82%, suggesting relatively contained downside. Profitability metrics are strong, with a 35.37% operating margin and 27.04% profit margin. Returns on capital are high, with ROA at 24.38% and ROE at 152.02%, indicating efficient asset use and strong capital efficiency. Growth metrics support ongoing business strength, with quarterly revenue growth of 15.70% and earnings growth of 18.30% year-over-year. Forward estimates also remain positive, with expected earnings growth of 9.68% and revenue growth of 6.87%.

3. Evidence that weakens the thesis  

Net EPS revisions over the past 30 days are negative (-3), indicating some 
deterioration in analyst sentiment.

4. Missing evidence  

No material gaps in the provided dataset.

5. Verdict  

partially_supported — There is more supporting evidence than contradicting 
evidence, but the thesis is not fully confirmed.

6. Bottom-line assessment  

Apple demonstrates strong and consistent business quality supported by high margins, returns, and continued growth. Downside has been relatively contained over the observed period, though not negligible. However, negative earnings revisions introduce some caution, leaving the thesis supported but not conclusively established.

This example makes the goal of the project much clearer. We are not building a system that simply tells us what happened to Apple. We are building one that takes a claim, checks it against market and fundamentals data, and returns a structured judgment.

That distinction matters because the memo is only the final surface. Underneath it, the system first parses the thesis, pulls prices and fundamentals through EODHD’s MCP server, computes the relevant signals, builds support and contradiction, assigns a verdict, and only then writes the final note. That is what gives the output its structure.

In this first part, we’ll build everything up to the evidence layers that power this kind of output.

What Makes This Different from a Normal Stock Assistant

Stock assistant vs Thesis copilot workflow comparison (Image by Author) — **Stock assistant vs Thesis copilot workflow comparison** (Image by Author)

A normal stock assistant starts with a ticker and tries to explain what happened. It may summarize price action, mention a few ratios, and add some company context. That is useful when the question is broad, but it is not enough when the input is a specific investment view.

This project starts from the opposite direction. The input is not “tell me about Apple.” The input is a claim, like Apple looks attractive because downside has been controlled and business quality remains high. That changes the job of the system. It now has to test each part of that claim, decide what supports it, decide what weakens it, and be clear about what is still missing.

That one shift is what shapes the whole workflow. Instead of ending at retrieval and summarization, the pipeline has to parse the thesis, map the data to the right kind of evidence, and return a verdict. That is what makes this feel like a research copilot rather than a better stock summary tool.

The Workflow

At a high level, the copilot follows a simple sequence:

parse the user’s thesis into a structured request
fetch historical prices and fundamentals through MCP
turn those inputs into market and business signals
map those signals into support, contradiction, and missing evidence
assign a verdict
write the final memo

That is the full loop. The output may look like a short research note, but it sits on top of a more controlled pipeline in core.py.

Project structure


project/
├── client.py
├── core.py
└── test.ipynb

client.py is the MCP access layer. It connects to EODHD, lists tools, calls them with retries and timeouts, and returns metadata for each request. core.py contains the actual thesis-testing logic, including parsing, data fetching, signal computation, evidence building, verdict assignment, and memo generation. test.ipynb is where the quality checks and end-to-end demos are run.

This split is useful because it keeps the tutorial easy to follow. When we move into code, each block has a clear place. MCP access stays in client.py, while the research workflow stays in core.py.

Building the MCP Client

We’ll start with the thinnest part of the project, which is the MCP access layer.

This file only does one job. It connects to EODHD’s MCP server, lists available tools, calls a tool with retries and a timeout, and returns a small metadata object alongside the response. The actual thesis logic does not belong here. Keeping this layer small makes the rest of the project much easier to reason about later.

Create a file called client.py and add this:


import time
import asyncio

from mcp import ClientSession
from mcp.client.streamable_http import streamablehttp_client


class EODHDMCP:
    def __init__(self, apikey, base_url=None):
        self.apikey = apikey
        self.base_url = base_url or "https://mcp.eodhd.dev/mcp"
        self._tools = None

    def _url(self):
        return f"{self.base_url}?apikey={self.apikey}"

    def _open(self):
        return streamablehttp_client(self._url())

    async def list_tools(self):
        if self._tools is not None:
            return self._tools

        async with self._open() as (read, write, _):
            async with ClientSession(read, write) as s:
                await s.initialize()
                resp = await s.list_tools()
                self._tools = [t.name for t in resp.tools]
                return self._tools

    async def call_tool(self, name, args, trace_id, timeout_s=25, retries=2):
        last = None

        for attempt in range(retries + 1):
            t0 = time.time()
            try:
                async with self._open() as (read, write, _):
                    async with ClientSession(read, write) as s:
                        await s.initialize()
                        out = await asyncio.wait_for(s.call_tool(name, args), timeout=timeout_s)
                        dt = time.time() - t0
                        meta = {
                            "trace_id": trace_id,
                            "tool": name,
                            "args": args,
                            "latency_s": round(dt, 3),
                        }
                        return out, meta
            except Exception as e:
                last = e
                if attempt < retries:
                    await asyncio.sleep(0.5 * (attempt + 1))

        raise last

There are only two methods that really matter here. list_tools() is just a quick way to inspect and cache the tools exposed by the MCP server. call_tool() is the method the rest of the project will actually use. It makes the request, applies timeout and retry handling, and returns both the raw output and a small metadata object.

That metadata becomes useful later because the workflow stays traceable. When the copilot returns a memo, we still know which tool was called, with what arguments, and how long it took. So even though this file is small, it gives the rest of the system a clean and inspectable access layer.

Setting Up core.py

Now that the MCP client is ready, we can start building the main workflow in core.py.

This file will hold the actual thesis-testing logic, so the first step is to set up the imports, API clients, a few limits, and some small helper functions that the rest of the pipeline will reuse.

Create a file called core.py and start with this:


import json
import re
import time
import uuid
import asyncio
from datetime import date, timedelta

import numpy as np
import pandas as pd
from openai import OpenAI

from client import EODHDMCP

eodhd_api_key = "your eodhd api key"
mcp_base_url = "https://mcp.eodhd.dev/mcp"

openai_api_key = "your openai api key"
model_name = "gpt-5.3-chat-latest"

max_lookback_days = 365
max_tool_calls = 10
max_tickers = 5

mcp = EODHDMCP(eodhd_api_key, base_url=mcp_base_url)
oa = OpenAI(api_key=openai_api_key)

def log_event(event, trace_id, **extra):
    payload = {
        "event": event,
        "trace_id": trace_id,
        "ts": round(time.time(), 3),
    }
    payload.update(extra)
    print(json.dumps(payload, default=str))

def get_dates_from_lookback(days):
    end = date.today()
    start = end - timedelta(days=int(days))
    return start.isoformat(), end.isoformat()

def make_state():
    return {
        "tool_calls": 0,
        "tool_trace": [],
    }

def bump_tool_call(state, meta):
    state["tool_calls"] += 1
    state["tool_trace"].append(meta)

    if state["tool_calls"] > max_tool_calls:
        raise RuntimeError("tool call budget exceeded")

def to_text(out):
    if isinstance(out, str):
        return out.strip()

    if hasattr(out, "content"):
        try:
            parts = []
            for item in out.content:
                if hasattr(item, "text") and item.text is not None:
                    parts.append(item.text)
                else:
                    parts.append(str(item))
            return "\n".join(parts).strip()
        except Exception:
            pass

    return str(out).strip()

Note: Replace “your eodhd api key” with your actual EODHD API key. If you don’t have one, you can obtain it by opening an EODHD developer account.

This block does three things:

First, it sets up the two clients we need. mcp is the EODHD MCP client from client.py, and oa is the OpenAI client that will be used for parsing and memo generation later.
Second, it defines a few small limits for the workflow. These help keep the system controlled by capping the lookback window, the number of tickers, and the number of tool calls in a single run.
Third, it adds helper functions that the rest of the file depends on. log_event() gives us lightweight tracing, get_dates_from_lookback() converts a lookback window into start and end dates, make_state() and bump_tool_call() help track MCP usage, and to_text() safely converts tool output into plain text before we parse it.

Parsing a Research Prompt into a Structured Request

The first thing this copilot needs to do is clean up the input. A user is not going to send a perfectly formatted request every time. They are more likely to write a research thought in plain English and mix the thesis, ticker, and timeframe into one prompt.

That is why the system starts by turning the raw prompt into four fields:

ticker
lookback window
thesis
mode

This logic goes into core.py.


def parse_request(text):
    prompt = f"""
You are extracting fields for a financial thesis-testing copilot.

Return only valid JSON with this exact shape:
{{
  "tickers": ["AAPL"],
  "lookback_days": 180,
  "thesis": "the actual thesis statement",
  "mode": "single"
}}

Rules:
- Extract only tickers explicitly mentioned or strongly implied.
- Do not invent tickers.
- If there are multiple tickers, mode must be "watchlist".
- If there is one ticker, mode must be "single".
- If no timeframe is mentioned, use 180.
- Convert months to days using 30 days per month.
- Convert years to days using 365 days per year.
- Keep the thesis concise but faithful to the user's intent.
- Return JSON only. No markdown. No explanation.

User request:
{text}
""".strip()

    r = oa.responses.create(
        model=model_name,
        input=[{"role": "user", "content": prompt}],
    )

    raw = r.output_text.strip()

    try:
        parsed = json.loads(raw)
    except Exception:
        raise RuntimeError(f"parser returned non-json text: {raw[:500]}")

    return parsed

This function gives the model one very narrow job. It is not asking for an opinion or analysis. It is only asking for structured extraction. That matters because we want flexibility at the input layer, but we do not want the whole workflow to become fuzzy.

Once the model returns that JSON, Python takes over and tightens it up.


def enforce_limits(parsed):
    tickers = parsed.get("tickers", [])
    if not isinstance(tickers, list):
        tickers = []

    tickers = [str(x).upper().strip() for x in tickers if str(x).strip()]
    tickers = tickers[:max_tickers]

    lookback_days = parsed.get("lookback_days", 180)
    try:
        lookback_days = int(lookback_days)
    except Exception:
        lookback_days = 180

    if lookback_days < 1:
        lookback_days = 1
    if lookback_days > max_lookback_days:
        lookback_days = max_lookback_days

    thesis = str(parsed.get("thesis", "")).strip()
    if not thesis:
        thesis = "No thesis provided."

    mode = parsed.get("mode", "single")
    if len(tickers) > 1:
        mode = "watchlist"
    else:
        mode = "single"

    return {
        "tickers": tickers,
        "lookback_days": lookback_days,
        "thesis": thesis,
        "mode": mode,
    }

This second function is what keeps the workflow controlled. It cleans the tickers, caps how many we allow in one request, clamps the time window, and makes sure the mode matches the number of tickers. So the model gives us flexibility, while the code gives us boundaries. That combination is important for a build like this.

Fetching the Two Data Sources: Historical & Fundamental Data

Once the request is parsed, the next step is to pull the data that will feed the rest of the workflow. For this version, we only use two sources from EODHD. historical prices and fundamentals. That is enough to test a surprising number of thesis types without making the build unnecessarily wide.

Add these two functions to core.py:


async def fetch_prices(ticker, start_date, end_date, trace_id, state):
    args = {
        "ticker": ticker,
        "start_date": start_date,
        "end_date": end_date,
        "period": "d",
        "order": "a",
        "fmt": "json",
    }

    out, meta = await mcp.call_tool("get_historical_stock_prices", args, trace_id)
    text = to_text(out)

    bump_tool_call(state, meta)

    if not text:
        raise RuntimeError("empty response from get_historical_stock_prices")

    try:
        data = json.loads(text)
    except Exception:
        raise RuntimeError(f"price tool returned non-json text: {text[:300]}")

    if isinstance(data, dict) and data.get("error"):
        raise RuntimeError(data["error"])

    df = pd.DataFrame(data)
    if df.empty:
        return df

    keep = [c for c in ["date", "close"] if c in df.columns]
    df = df[keep].copy()
    df["ticker"] = ticker

    return df


async def fetch_fundamentals(ticker, trace_id, state):
    args = {
        "ticker": ticker,
        "include_financials": False,
        "fmt": "json",
    }

    out, meta = await mcp.call_tool("get_fundamentals_data", args, trace_id)
    text = to_text(out)

    bump_tool_call(state, meta)

    if not text:
        raise RuntimeError("empty response from get_fundamentals_data")

    try:
        data = json.loads(text)
    except Exception:
        raise RuntimeError(f"fundamentals tool returned non-json text: {text[:300]}")

    if isinstance(data, dict) and data.get("error"):
        raise RuntimeError(data["error"])

    return data

fetch_prices() pulls daily historical data for the requested window and reduces it to the fields we actually need right now. date, close, and the ticker itself. That trimmed DataFrame is what we will later use for return, drawdown, volatility, trend, and other market signals.
fetch_fundamentals() keeps the fundamentals payload as JSON because we will extract different categories from it in the next sections, including margins, growth, valuation, revisions, and beta.

A couple of details matter here. Both functions run through the same MCP wrapper, so they automatically inherit the timeout, retry, and metadata handling we already built in client.py. Both also call bump_tool_call(), which lets us track how many external calls were made during a single run. That becomes useful later when we want the workflow to stay inspectable rather than feel like a black box.

Building the First Evidence Layer from Price Data

Once the price data is in, the next step is to turn that raw series into something we can actually reason with. For this copilot, price history is not the final answer, but it is still the first evidence layer. It helps us test claims around downside control, risk, momentum, and the quality of returns.

Add this to core.py:


def compute_price_signals(prices_df):
    if prices_df is None or prices_df.empty:
        return {}

    df = prices_df.copy()
    df["date"] = pd.to_datetime(df["date"], errors="coerce")
    df["close"] = pd.to_numeric(df["close"], errors="coerce")

    df = df.dropna(subset=["date", "close"]).sort_values("date")
    if df.empty:
        return {}

    close = df["close"]
    rets = close.pct_change().dropna()

    out = {
        "n_points": int(len(close)),
        "start_price": float(close.iloc[0]),
        "end_price": float(close.iloc[-1]),
    }

    if len(close) >= 2:
        out["ret_total"] = float(close.iloc[-1] / close.iloc[0] - 1)

    if not rets.empty:
        vol_daily = float(rets.std())
        vol_annualized = float(vol_daily * np.sqrt(252))

        out["vol_daily"] = vol_daily
        out["vol_annualized"] = vol_annualized

        if vol_annualized > 0 and "ret_total" in out:
            out["ret_to_vol"] = float(out["ret_total"] / vol_annualized)

    peak = close.cummax()
    drawdown = close / peak - 1
    out["max_drawdown"] = float(drawdown.min())

    logp = np.log(close.values)
    x = np.arange(len(logp))
    if len(logp) >= 3:
        out["trend_slope"] = float(np.polyfit(x, logp, 1)[0])
    else:
        out["trend_slope"] = 0.0

    return out

This function gives us a compact set of market signals from a plain close-price series. ret_total tells us how the stock moved over the full window. vol_annualized tells us how noisy that move was. max_drawdown is useful when the thesis talks about downside control. trend_slope gives us a simple directional measure, and ret_to_vol helps us judge return quality instead of looking at raw return alone.

The important point here is that we are not asking the model to infer all of this from raw prices. We compute it first in Python, so the later reasoning step starts from explicit signals rather than vague interpretation. That makes the whole workflow much more stable.

Building the Second Evidence Layer from Fundamentals

Price data gives us one side of the thesis. The second side comes from fundamentals. This is the part that made the project stop sounding generic. Once the copilot started treating fundamentals as actual evidence, instead of just company profile data, the outputs became much more useful.

Add this helper first in core.py:


def _to_float(x):
    if x in (None, "", "NA"):
        return None
    try:
        return float(x)
    except Exception:
        return None

This small function just cleans values before we use them. Fundamentals payloads often contain strings, nulls, or "NA", so it helps to normalize everything early.

Now add the main function:


def compute_fundamental_signals(fundamentals):
    if not isinstance(fundamentals, dict):
        return {}

    general = fundamentals.get("General", {}) or {}
    highlights = fundamentals.get("Highlights", {}) or {}
    valuation = fundamentals.get("Valuation", {}) or {}
    technicals = fundamentals.get("Technicals", {}) or {}

    earnings = fundamentals.get("Earnings", {}) or {}
    trend = earnings.get("Trend", {}) or {}

    latest_trend = None
    if isinstance(trend, dict) and trend:
        latest_key = sorted(trend.keys())[-1]
        latest_trend = trend.get(latest_key, {}) or {}
    else:
        latest_trend = {}

    out = {
        "sector": general.get("Sector"),
        "industry": general.get("Industry"),
        "employees": _to_float(general.get("FullTimeEmployees")),

        "market_cap": _to_float(highlights.get("MarketCapitalization")),
        "pe_ratio": _to_float(highlights.get("PERatio")),
        "peg_ratio": _to_float(highlights.get("PEGRatio")),
        "profit_margin": _to_float(highlights.get("ProfitMargin")),
        "operating_margin": _to_float(highlights.get("OperatingMarginTTM")),
        "roa": _to_float(highlights.get("ReturnOnAssetsTTM")),
        "roe": _to_float(highlights.get("ReturnOnEquityTTM")),
        "revenue_ttm": _to_float(highlights.get("RevenueTTM")),
        "revenue_growth_yoy": _to_float(highlights.get("QuarterlyRevenueGrowthYOY")),
        "earnings_growth_yoy": _to_float(highlights.get("QuarterlyEarningsGrowthYOY")),
        "dividend_yield": _to_float(highlights.get("DividendYield")),

        "trailing_pe": _to_float(valuation.get("TrailingPE")),
        "forward_pe": _to_float(valuation.get("ForwardPE")),
        "price_sales": _to_float(valuation.get("PriceSalesTTM")),
        "price_book": _to_float(valuation.get("PriceBookMRQ")),
        "ev_revenue": _to_float(valuation.get("EnterpriseValueRevenue")),
        "ev_ebitda": _to_float(valuation.get("EnterpriseValueEbitda")),

        "beta": _to_float(technicals.get("Beta")),

        "earnings_estimate_growth": _to_float(latest_trend.get("earningsEstimateGrowth")),
        "revenue_estimate_growth": _to_float(latest_trend.get("revenueEstimateGrowth")),
        "eps_revisions_up_30d": _to_float(latest_trend.get("epsRevisionsUpLast30days")),
        "eps_revisions_down_30d": _to_float(latest_trend.get("epsRevisionsDownLast30days")),
    }

    if out["trailing_pe"] is not None and out["forward_pe"] is not None:
        out["forward_vs_trailing_pe_change"] = out["forward_pe"] - out["trailing_pe"]

    if out["eps_revisions_up_30d"] is not None and out["eps_revisions_down_30d"] is not None:
        out["net_eps_revisions_30d"] = out["eps_revisions_up_30d"] - out["eps_revisions_down_30d"]

    return out

This function pulls together the parts of the fundamentals payload that matter most for thesis testing.

From Highlights, we get profitability, returns on capital, growth, and market cap. From Valuation, we get multiples like trailing P/E, forward P/E, price-to-sales, and EV-based ratios.
From Technicals, we take beta.
From Earnings.Trend, we pick up forward estimate growth and revision data.

These are the fields that let us test claims around business quality, premium justification, valuation, and forward expectations in a much more concrete way.

The last two derived fields are also useful. The gap between forward P/E and trailing P/E gives us a quick way to see whether valuation is easing or staying stretched. Net EPS revisions over the last 30 days tell us whether analyst expectations are improving or deteriorating.

Wrapping Up Part 1

At this point, the copilot can already do a lot of the heavy lifting. It can take a natural-language thesis, turn it into a structured request, pull historical prices and fundamentals through MCP, and convert both into reusable signal layers.

That gives us the raw material we need for the actual reasoning step. We now have price-based signals like return, volatility, drawdown, trend, and return quality, along with fundamental signals covering margins, capital efficiency, growth, valuation, revisions, and beta.

In Part 2, we’ll take those signals and turn them into something more useful: supporting evidence, weakening evidence, missing evidence, a verdict, and the final research memo. That is where the copilot stops looking like a data pipeline and starts behaving like a real thesis-testing system.

Stay tuned for the second part!

tech. finance. ai