top of page
  • Nikhil Adithyan

Real-Time Stock News Sentiment Prediction with Python

Updated: Nov 28, 2023

An interesting practical application of AI & ML in the world of stock trading

Disclaimer: All the content featured in this article is strictly for educational purposes and should not be taken as a form of investment advice


Understanding how people feel about a particular stock is crucial in predicting its future prices. One of the best ways to gauge public sentiment is by monitoring stock news. However, analyzing whether a news piece has a positive or negative impact on a stock’s value can be quite challenging. But fear not, we’ve got a solution!

In this article, we’re going to dive into a groundbreaking model that predicts, in real-time, whether the news sentiment is positive or negative. We’ve harnessed the power of the Financial Modeling Prep’s (FMP) Stock News Sentiment API, which provides high-quality, well-labeled data with sentiment scores.

This incredible dataset, available through FMP’s API, has a wide range of applications, and one of the most intriguing is predicting sentiment in news pieces, which we’ll explore right here.

Let’s get started on this exciting journey!

Importing the Tools

To make this magic happen, we’ll need some tools. We’ll import the libraries that help us process and analyze the data. It’s like getting our gear ready before a big exploration.

import pandas as pd
import requests
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier
from sklearn.naive_bayes import BernoulliNB
from sklearn.preprocessing import LabelEncoder
from sklearn import metrics
  1. requests: Provides a simple API for interacting with HTTP operations such as GET, POST, etc.

  2. pandas: Pandas is a fast, powerful, flexible, and easy-to-use open-source data analysis and manipulation tool.

  3. CountVectorizer: It is a tool for converting text data into a numerical format suitable for machine learning.

  4. train_test_split: It’s a function to split a dataset into training and testing subsets for model evaluation and testing.

  5. XGBClassifier: A classifier based on the XGBoost algorithm, known for its high performance in gradient boosting.

  6. BernoulliNB: A classifier implementing the Bernoulli Naive Bayes algorithm for binary feature data.

  7. LabelEncoder: Converts categorical labels into numerical values for the model to understand.

  8. metrics: Provides various functions and tools for evaluating the performance of the models

If you haven’t installed any of the imported packages, make sure to do so using the pip command in your terminal. Before moving further, to extract the data, we’ll be using the APIs of FinancialModelingPrep. So for the smooth execution of the upcoming code block, you need to have an FMP developer account which you can easily create using the link here.

Accessing the Treasure of News using FMP

Before we embark on this adventure, we need to access the wealth of news data available through FMP’s Stock News Sentiment API.

api_key = 'YOUR API KEY'
url = f'{api_key}'
response = requests.get(url).json()
data = pd.DataFrame(response)

The code is pretty simple and clear. We are first storing the API key (make sure to replace YOUR API KEY with your FMP API key) and API URL in their respective variables, then, we are making an API call to extract the news sentiment data. Finally, we are converting the JSON response into a dataframe which looks like this:

The endpoint takes out the most recent classified stock news as Positive, Neutral, and Negative. The data also include the sentiment scores and the site to maintain authenticity.

Creating the Dataset

Here’s where we assemble the raw data. We’re going to collect news articles and their corresponding sentiment scores. Think of this step as gathering all the puzzle pieces before putting them together.

for i in range(0, 100):
    api_key = 'YOUR API KEY'
    url = f'{i}&apikey={api_key}'
    response = requests.get(url).json()
    df = pd.DataFrame(response)
    data = data.merge(df, how= 'outer')

Creating the training and testing data by changing the page number from 0 to 100. Each page contains 100 data points so the dataset contains 10000 rows and 9 columns. The data is merged in each loop with the previous data.

Getting Things in Shape: Preprocessing

Before we can work with the data, we need to make sure it’s clean and well-organized. This is like sorting through your treasures and making sure they’re all in the right places.

Dropping the null values and converting the text data to numerical for the model to understand through the for loop and Count Vectorizer. Splitting the dataset into training and testing in a 70:30 ratio.

data = data.dropna()

le = LabelEncoder()
data.sentiment = le.fit_transform(data.sentiment)

count_vectorizer =  CountVectorizer(max_features = 1000)

feature_vector =
train_ds_features =  count_vectorizer.transform(data.text)

train_x, test_x, train_y, test_y =  train_test_split(train_ds_features, data.sentiment,
                                                     test_size = 0.3, random_state = 42)

Training the Model

Now, the fun part! We’ll feed our model with loads of data so it can learn to distinguish between positive and negative news sentiments.

We are using two models here, you are free to any of them. The models are trained on train_x and train_y.

train_y = train_y.astype('int')

nb_clf = BernoulliNB(), train_y)

xg_clf = XGBClassifier(), train_y)

Putting It to the Test: Predicting on News Sentiments

With our model all trained and ready, it’s time to see how well it can predict sentiment in real-time news. It’s like letting your pet show off its new tricks in front of an audience.

from sklearn.metrics import classification_report

test_xg_predicted = xg_clf.predict(test_x.toarray())
print(classification_report(test_y, test_xg_predicted))

test_nb_predicted = nb_clf.predict(test_x.toarray())
print(classification_report(test_y, test_nb_predicted))

Left: XGBoost’s Classification Report; Right: Naive Bayes’ Classification Report

Classification reports can be tricky to understand, here is the inference for terms involved in the report.

Let’s consider the example of the 0th class, which is labeled as “Negative.” In the test dataset, there are 245 instances of this class. When we look at the precision value of 0.77, it indicates that the model accurately predicts class 0 for most cases. Precision measures how well the model makes positive predictions.

However, the recall, which is also known as sensitivity or the true positive rate, is quite low. This means that the model struggles to correctly identify all the relevant instances in the dataset, not just for class 0, but for other classes as well. One possible reason for this issue could be the imbalanced distribution of numbers in the test dataset.

In simpler terms, the F1 score is a combination of precision and recall, giving us a balanced measure of the model’s performance.

After scrutinizing the classification reports of both the models it is evident that XGBoost performed better than the Naive Bayes model in nearly all the metrics. This indicates that XGBoost is a superior choice for predicting news sentiments and has the potential for further enhancements.

Conclusion: Charting New Territories

To sum it up, this article showed a way to predict how news makes people feel using data from the Financial Model Prep API. This data has many potential uses, but our main focus was on predicting sentiments, which helps us understand the market better and make smarter decisions. The model we introduced can accurately predict how stock news is seen by people in real time, which can be very helpful for investors and analysts!

With that being said, you’ve reached the end of the article. Hope you learned something new and useful today. If you have any suggestions for improving the ML model we built in this article, kindly let me in the comments. Thank you very much for your time.


bottom of page