Mastering Volatility Forecasting: A Step-by-Step Guide to Building a Powerful GARCH Model in Python

7 min readJul 14, 2023

This article aims to provide a comprehensive guide on developing a volatility forecasting model using Python. We will utilize the yfinance library to retrieve historical volatility data and implement the GARCH (Generalized Autoregressive Conditional Heteroskedasticity) model to estimate and forecast volatility.

Volatility is a crucial aspect of financial markets as it measures the degree of variation in the price of a financial instrument over time. Accurate volatility forecasting can assist traders and investors in making informed decisions and managing risk effectively.

We will cover the following topics:

Introduction to Volatility Forecasting
Retrieving Historical Volatility Data with yfinance
Exploratory Data Analysis (EDA) of Volatility Data
Implementing the GARCH Model
Estimating and Forecasting Volatility
Evaluating the Model Performance

Before we dive into the implementation, make sure you have the following Python libraries installed:

yfinance
numpy
pandas
matplotlib
statsmodels

You can install these libraries using pip:

pip install yfinance numpy pandas matplotlib statsmodels

1. Introduction to Volatility Forecasting

Volatility is a statistical measure of the dispersion of returns for a given financial instrument. It quantifies the degree of variation in the price of an asset over a specific period. Volatility is an essential concept in finance as it helps investors and traders assess the risk associated with an investment.

Volatility forecasting involves predicting the future volatility of a financial instrument based on historical data.

Accurate volatility forecasts can assist in various financial applications, such as portfolio optimization, risk management and option pricing.

There are several methods for volatility forecasting, including historical volatility, implied volatility and model-based approaches. In this tutorial, we will focus on the GARCH (Generalized Autoregressive Conditional Heteroskedasticity) model, which is widely used for volatility forecasting.

2. Retrieving Historical Volatility Data with `yfinance`

To develop our volatility forecasting model, we need historical volatility data. We can retrieve this data using the yfinance library, which provides an easy-to-use interface for accessing financial data from Yahoo Finance.

Let’s start by installing the yfinance library:

pip install yfinance

Once installed, we can import the library and retrieve historical volatility data for a specific stock. For this tutorial, we will use the stock symbol “GS” (Goldman Sachs Group Inc.) as an example.

import yfinance as yf

# Retrieve historical volatility data for GS
stock = yf.Ticker("GS")
volatility_data = stock.history(period="max")

The history method retrieves historical price data for the specified stock symbol. By setting the period parameter to "max", we retrieve the entire available history of the stock.

3. Exploratory Data Analysis (EDA) of Volatility Data

Before implementing the GARCH model, it’s essential to perform exploratory data analysis (EDA) to gain insights into the volatility data. EDA helps us understand the characteristics of the data and identify any patterns or anomalies.

Let’s start by visualizing the historical volatility data using a line plot:

import matplotlib.pyplot as plt

# Plot the historical volatility data
plt.figure(figsize=(10, 6))
plt.plot(volatility_data.index, volatility_data["Close"])
plt.xlabel("Date")
plt.ylabel("Volatility")
plt.title("Historical Volatility Data")
plt.grid(True)

plt.show()

Plot 1 — Figure 1: Line plot of historical volatility data. Created by Author

The line plot provides a visual representation of the volatility data over time. It helps us identify any trends, seasonality, or outliers present in the data.

Next, let’s calculate and plot the rolling mean and standard deviation of the volatility data:

# Calculate the rolling mean and standard deviation
rolling_mean = volatility_data["Close"].rolling(window=30).mean()
rolling_std = volatility_data["Close"].rolling(window=30).std()

# Plot the rolling mean and standard deviation
plt.figure(figsize=(10, 6))
plt.plot(volatility_data.index, volatility_data["Close"], label="Volatility")
plt.plot(rolling_mean.index, rolling_mean, label="Rolling Mean")
plt.plot(rolling_std.index, rolling_std, label="Rolling Std")
plt.xlabel("Date")
plt.ylabel("Volatility")
plt.title("Rolling Mean and Standard Deviation of Volatility Data")
plt.legend()
plt.grid(True)

plt.show()

Plot 2 — Figure 2: Line plot of historical volatility data with rolling mean and standard deviation. Created by Author

The rolling mean and standard deviation provide insights into the long-term trends and volatility clustering in the data. They help us identify periods of high and low volatility.

4. Implementing the GARCH Model

The GARCH (Generalized Autoregressive Conditional Heteroskedasticity) model is a popular model for volatility forecasting. It captures the time-varying nature of volatility by incorporating lagged values of both the returns and the conditional variance.

To implement the GARCH model, we will use the arch library, which provides a comprehensive set of tools for estimating and forecasting volatility models.

Let’s start by installing the arch library:

pip install arch

Once installed, we can import the necessary classes and functions from the library:

import numpy as np
from arch import arch_model

Next, we need to preprocess the volatility data by calculating the log returns:

# Calculate log returns
returns = np.log(volatility_data["Close"]).diff().dropna()

The log returns represent the percentage change in the volatility data from one period to the next. They are commonly used in financial analysis as they provide a more meaningful representation of the data.

Now, we can fit the GARCH model to the log returns:

# Fit the GARCH(1, 1) model
model = arch_model(returns, vol="Garch", p=1, q=1)
results = model.fit()

In this example, we are fitting a GARCH(1, 1) model, which includes one lag of both the returns and the conditional variance. You can experiment with different model specifications to find the best fit for your data.

5. Estimating and Forecasting Volatility

Once we have fitted the GARCH model, we can estimate and forecast the volatility. The estimated volatility represents the conditional variance of the log returns, while the forecasted volatility provides future volatility predictions.

Let’s start by estimating the volatility using the fitted model:

# Estimate the volatility
volatility = results.conditional_volatility

Next, let’s plot the estimated volatility along with the actual volatility:

# Plot the estimated and actual volatility
plt.figure(figsize=(10, 6))
plt.plot(volatility.index, volatility, label="Estimated Volatility")
plt.plot(returns.index, returns, label="Actual Volatility")
plt.xlabel("Date")
plt.ylabel("Volatility")
plt.title("Estimated and Actual Volatility")
plt.legend()
plt.grid(True)

plt.show()

Plot 3 — Figure 3: Line plot of estimated and actual volatility. Created by Author

The plot shows the estimated volatility (conditional variance) and the actual volatility (log returns) over time. It helps us assess the accuracy of the volatility estimates.

Now, let’s forecast the future volatility using the fitted model:

# Forecast the volatility
forecast = results.forecast(start=0, horizon=30)
forecast_volatility = forecast.variance.dropna().values.flatten()

# Plot the forecasted volatility
plt.figure(figsize=(10, 6))
plt.plot(forecast_volatility, label="Forecasted Volatility")
plt.xlabel("Time")
plt.ylabel("Volatility")
plt.title("Forecasted Volatility")
plt.legend()
plt.grid(True)

plt.show()

Plot 4 — Figure 4: Line plot of forecasted volatility. Created by Author

The plot displays the forecasted volatility for the next 30 periods. It provides insights into the expected future volatility based on the GARCH model.

6. Evaluating the Model Performance

To evaluate the performance of our volatility forecasting model, we can calculate various metrics, such as the mean absolute error (MAE) and the root mean squared error (RMSE).

Let’s start by calculating the MAE and RMSE:

# Calculate the mean absolute error (MAE)
mae = np.mean(np.abs(volatility - returns))
print("Mean Absolute Error (MAE):", mae)

# Calculate the root mean squared error (RMSE)
rmse = np.sqrt(np.mean((volatility - returns) ** 2))
print("Root Mean Squared Error (RMSE):", rmse)

The MAE and RMSE provide measures of the average and overall forecast errors, respectively. Lower values indicate better model performance.

Mean Absolute Error (MAE): 0.023676014790345607

Root Mean Squared Error (RMSE): 0.032365900632945505

Additionally, we can visualize the forecast errors using a histogram:

# Calculate the forecast errors
errors = volatility - returns

# Plot the histogram of forecast errors
plt.figure(figsize=(10, 6))
plt.hist(errors, bins=30, density=True)
plt.xlabel("Forecast Error")
plt.ylabel("Density")
plt.title("Histogram of Forecast Errors")
plt.grid(True)

plt.show()

Plot 5 — Figure 5: Histogram of forecast errors. Created by Author

The histogram provides insights into the distribution of forecast errors. It helps us assess the accuracy and reliability of the volatility forecasts.

Conclusion

In this tutorial, we developed a volatility forecasting model using Python. We retrieved historical volatility data using the yfinance library, performed exploratory data analysis (EDA), implemented the GARCH model, estimated and forecasted volatility, and evaluated the model performance.

Volatility forecasting plays a crucial role in financial analysis and risk management. By accurately predicting future volatility, traders and investors can make informed decisions and effectively manage their portfolios.

Remember to experiment with different model specifications and data sources to improve the accuracy of your volatility forecasts. Volatility forecasting is a complex task that requires continuous learning and refinement.

I hope this tutorial has provided you with a solid foundation for developing your own volatility forecasting models.

Become a Medium member today and enjoy unlimited access to thousands of Python guides and Data Science articles! For just $5 a month, you’ll have access to exclusive content and support as a writer. Sign up now using my link, and I’ll earn a small commission at no extra cost to you.

Mastering Volatility Forecasting: A Step-by-Step Guide to Building a Powerful GARCH Model in Python

1. Introduction to Volatility Forecasting

2. Retrieving Historical Volatility Data with `yfinance`

3. Exploratory Data Analysis (EDA) of Volatility Data

4. Implementing the GARCH Model

5. Estimating and Forecasting Volatility

6. Evaluating the Model Performance

Conclusion

Written by The AI Quant

Responses (2)

Mastering Volatility Forecasting: A Step-by-Step Guide to Building a Powerful GARCH Model in Python

1. Introduction to Volatility Forecasting

2. Retrieving Historical Volatility Data with yfinance

3. Exploratory Data Analysis (EDA) of Volatility Data

4. Implementing the GARCH Model

5. Estimating and Forecasting Volatility

6. Evaluating the Model Performance

Conclusion

Written by The AI Quant

Responses (2)

2. Retrieving Historical Volatility Data with `yfinance`