top of page
  • Nikhil Adithyan

Time Series Forecasting with Prophet & APIs

One of the simplest yet effective methods to conduct time series forecasting in Python



Time series forecasting is a powerful tool in the realm of predictive financial analysis. Whether it’s stock prices, geographical segmentation, or product-based revenue, the ability to predict future trends is crucial for making informed decisions. This article explores how we can extract valuable insights from the Financial Modeling Prep (FMP) API, providing a roadmap for both individuals and companies to enhance their strategic planning.


Importing the necessary packages


import requests
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller
from prophet import Prophet

plt.rcParams['figure.figsize'] = (20,10)
plt.style.use('fivethirtyeight')
  1. requests: provides a simple API for interacting with HTTP operations such as GET, POST, etc.

  2. pandas: pandas is a fast, powerful, flexible, and easy-to-use open-source data analysis and manipulation tool.

  3. matpllotlib for creating static, animated, and interactive visualizations in Python.

  4. adfuller to conduct the Augmented Dickey-Fuller (ADF) test for checking stationarity in time series data.

  5. Prophet library for time series forecasting, developed by Facebook. The Prophet class is used to create forecasting models.

Before moving further, to extract the data, we’ll be using the APIs of FinancialModelingPrep. So for the smooth execution of the upcoming code block, you need to have an FMP developer account which you can easily create using the link here.


Extracting the information from FMP’s APIs

The following function extract_geographical_revenue helps in the extraction of crucial information regarding a company’s revenue across different geographic regions. This is important for understanding the company’s global revenue distribution.


def extract_geographical_revenue(symbol):
    api_key = 'YOUR API KEY'
    url = f'https://financialmodelingprep.com/api/v4/revenue-geographic-segmentation?symbol={symbol}&period=quarter&structure=flat&apikey={api_key}'
    raw_df = requests.get(url).json()

    # Convert JSON to DataFrame
    data = []
    for entry in raw_df:
        for date, values in entry.items():
            row = {'Date': date}
            for country, value in values.items():
                row[country] = value
            data.append(row)

    df = pd.DataFrame(data)

    return df

df_geographical = extract_geographical_revenue('TSLA')
df_geographical.head()


The following function extract_product_revenue extract insights into how a company’s revenue is distributed across various product categories. This segmentation is crucial for understanding which products contribute most to a company’s overall revenue.


def extract_product_revenue(symbol):
    api_key = 'YOUR API KEY'
    url = f'https://financialmodelingprep.com/api/v4/revenue-product-segmentation?symbol={symbol}&period=quarter&structure=flat&apikey={api_key}'
    raw_df = requests.get(url).json()
    # Create an empty DataFrame
    df = pd.DataFrame()

    # Iterate through the data and append to the DataFrame
    for entry in raw_df:
        for date, values in entry.items():
            df = pd.concat([df, pd.DataFrame.from_dict({date: values}, orient='index')])

    # Reset the index
    df.reset_index(inplace=True)
    df.rename(columns={'index': 'Date'}, inplace=True)

    return df

df_product = extract_product_revenue('TSLA')
df_product.head()


Exploring and preparing the dataset

Before training the model we need to change the data in the proper format so that the model best understands and performs the analysis. In this section, we will explore the same.


# Changing all the NaN values to zero for further modifications.
df_geographical = df_geographical.fillna(0)

# Merging all the United States columns.
df_geographical['United States'] = df_geographical['UNITED STATES'] + df_geographical['U S'] + df_geographical['U']

# Merging all the China colums.
df_geographical['China'] = df_geographical['CHINA'] + df_geographical['C N']

# Drop the unused columns.
df_geographical = df_geographical.drop(["UNITED STATES" ,"U S", 'U','CHINA', 'C N'],axis =1)

# Reversing the rows to make it from earlier to newer.
df_geographical = df_geographical .iloc[::-1]

# Reseting the index
df_geographical = df_geographical.reset_index().drop(['index'],axis=1)
df_geographical.head()

# Set the 'DATE' column as the index
df_geographical = df_geographical.set_index('Date')
df_geographical.index = pd.to_datetime(df_geographical.index)

# Dropping the NA values.
df_geographical.dropna(inplace = True)

df_geographical.head()

Understanding Stationarity

Stationarity is a fundamental concept in time series analysis. A stationary time series is one whose statistical properties, such as mean and variance, remain constant over time. It implies that the series has a consistent behavior, and its patterns are predictable over different time periods.


Stationary time series exhibit the following properties:


  1. Constant mean: The mean of the series remains the same throughout time.

  2. Constant variance: The variance (or standard deviation) of the series remains constant over time.

  3. Constant autocovariance: The relationship between observations at different time lags remains constant.


Stationarity is important because many time series analysis techniques and forecasting models assume stationarity or work best with stationary data.


# Visualize the time series to find stationarity and patterns

plt.plot(df_geographical.index, df_geographical['United States'])
plt.xlabel('Quarters')
plt.ylabel('United States production')
plt.title('Production from United States each quarter')
plt.show()


Production across the United States of Tesla is increasing with increasing slope quarter by quarter.


The ADF test helps determine if a time series is stationary or not. It provides the ADF statistic, p-value, and critical values for different significance levels.


If the p-value is less than a chosen significance level (e.g., 0.05), we can reject the null hypothesis and conclude that the time series is stationary. Otherwise, if the p-value is greater than the significance level, we fail to reject the null hypothesis, suggesting that the time series is non-stationary.


# Perform the Augmented Dickey-Fuller test

result = adfuller(df_geographical['United States'])
print('ADF Statistic:', result[0])
print('p-value:', result[1])
print('Critical Values:')
for key, value in result[4].items():
    print(key, ':', value)


Trend Analysis

Trends help to uncover the underlying direction or pattern in the time series, which can be valuable for forecasting and decision-making.

To perform trend analysis on a time series, we can use various techniques, including visual inspection, moving averages, and regression analysis.


Visual Inspection: Plotting the time series data can often reveal the presence of a trend. A clear upward or downward movement over time suggests the presence of a trend component. Visual inspection allows you to observe the overall pattern and identify any deviations or changes in the series.


Moving Averages: Moving averages are widely used for trend analysis. They help smooth out short-term fluctuations in the data, making it easier to identify the underlying trend. Common types of moving averages include the simple moving average (SMA), weighted moving average (WMA), and exponential moving average (EMA).


Here are the three common types of moving averages used for trend analysis:


  1. Simple Moving Average (SMA): The Simple Moving Average calculates the average of a specified number of data points over a defined window. Each data point contributes equally to the average, and older observations are equally weighted as newer ones. The SMA provides a smoothed representation of the data by reducing random fluctuations.

  2. Weighted Moving Average (WMA): The Weighted Moving Average assigns different weights to the data points within the window. The weights can be linear or follow a specific pattern. The WMA gives more emphasis to recent observations, allowing it to respond more quickly to changes in the data compared to the SMA.

  3. Exponential Moving Average (EMA): The Exponential Moving Average is a type of weighted moving average that assigns exponentially decreasing weights to the data points. It places more weight on recent observations while gradually reducing the importance of older observations. The EMA is more responsive to recent changes in the data and is often used in technical analysis.

Regression Analysis: Regression analysis can be applied to estimate and visualize the trend component of a time series. It involves fitting a regression model to the data, where time is considered as an independent variable and the variable of interest is the dependent variable. The trend component can then be extracted from the regression model.


# Calculate the rolling mean (simple moving average) with a window size of 3 months
rolling_mean = df_geographical['United States'].rolling(window=3).mean()

# Visualize the original time series and the trend component
plt.plot(df_geographical.index, df_geographical['United States'], label='Original')
plt.plot(df_geographical.index, rolling_mean, color='red', label='Trend (Moving Average)')
plt.xlabel('Quarter')
plt.ylabel('Production from United States')
plt.title('Trend Analysis: United States Production')
plt.legend()
plt.show()


Training the model and forecasting

Once the data is in proper format we are now ready to train the model and predict the future. Here we go:


# Format data for prophet model using ds and y
df_geographical = df_geographical.reset_index() \
    .rename(columns={'Date':'ds',
                     'United States':'y'})

# Setup and train model and fit
model = Prophet()
model.fit(df_geographical)

# define the period for which we want a prediction
future = ['2023-09-30','2023-12-30','2024-03-30','2024-06-30','2024-09-30','2024-12-30','2025-03-30','2025-06-30','2025-09-30']
future = pd.DataFrame(future)
future.columns = ['ds']
future['ds']= pd.to_datetime(future['ds'])

forecast = model.predict(future)

# plot forecast
model.plot(forecast)
plt.show()

We can see that the model has predicted the production to decrease from the previous year but increase forward.


Conclusion

In conclusion, time series forecasting is a game-changer in the financial world. Leveraging APIs like Financial Modeling Prep opens up a treasure trove of data, enabling individuals and companies to make strategic decisions based on valuable insights. From cleaning and preparing data to trend analysis and forecasting, this article serves as a comprehensive guide for anyone looking to navigate the complex landscape of financial time series analysis.

Comments


bottom of page