AI-powered Option Strategy Generation with Python

15 min readMar 17, 2024

Welcome to the fascinating world of AI-powered option strategy generation with Python! In this comprehensive tutorial, we’ll embark on a journey to explore the intersection of machine learning and finance, equipping you with the knowledge and tools to create innovative and potentially profitable options strategies.

This tutorial is designed to be both informative and practical, guiding you through the entire process step-by-step. We’ll delve into the theoretical underpinnings of AI-powered option strategy generation, while simultaneously providing you with the necessary Python code examples and practical applications to build your own strategies from scratch.

Throughout this journey, we’ll emphasize a conversational tone and object-oriented programming principles to ensure clarity and enhance your understanding. We’ll leverage the power of yfinance to acquire real-world options data and incorporate quantitative analysis techniques to extract valuable insights. Subsequently, we’ll utilize the Keras deep learning library to construct a powerful model capable of generating option strategies.

Prerequisites:

Basic understanding of Python programming
Familiarity with financial concepts like options and their pricing
Interest in exploring the intersection of AI and finance

Table of Contents:

Data Acquisition with yfinance
Quantitative Analysis of Options Data
Model Development with Keras
Option Strategy Generation and Backtesting
Conclusion and Future Directions

1. Data Acquisition with yfinance

In this section, we’ll delve into the exciting world of acquiring real-world options data using the powerful yfinance library. We'll leverage two key functionalities:

Downloading Historical Price Data: yfinance allows you to download historical price data for various assets, including underlying securities for options contracts. This data serves as a crucial foundation for understanding historical market trends and option pricing behavior.
Accessing Real-Time Options Data: The yfinance library empowers you to retrieve real-time options data, providing insights into current market sentiment and option contract details like strike price, expiry date, bid/ask prices, and implied volatility. This information is essential for generating and evaluating option strategies.

1.1 Downloading Historical Price Data

While yfinance doesn't directly provide historical options data, it excels at downloading historical price data for the underlying asset. This data serves as a valuable starting point for our analysis:

import yfinance as yf

# Example: Downloading historical price data for Apple (AAPL)
aapl_data = yf.download("AAPL", start="2023-01-01", end="2024-02-24")

# Accessing specific data points like closing price
aapl_closing_prices = aapl_data["Close"]

This code snippet downloads historical price data for Apple (AAPL) from January 1st, 2023, to February 24th, 2024. You can then extract specific data points like closing prices for further analysis.

1.2 Accessing Real-Time Options Data

Now, let’s explore the heart of our data acquisition process: retrieving real-time options data using yfinance. We'll achieve this by employing the yf.Ticker.option_chain method:

def get_option_data(stock_symbol, expiration_date, option_type, strike):
    """
    Retrieves real-time options data for a specific contract.

    Args:
            stock_symbol (str): Ticker symbol of the underlying asset.
            expiration_date (str): Expiry date of the option contract (YYYY-MM-DD format).
            option_type (str): "call" or "put" for the desired option type.
            strike (float): Strike price of the option contract.
        Returns:
            pandas.DataFrame: DataFrame containing retrieved options data.
        """
        stock = yf.Ticker(stock_symbol)
        option_chain = stock.option_chain(expiration_date)
        options = getattr(option_chain, "calls" if option_type.startswith("call") else "puts")
        option_data = options[options["strike"] == strike]
        return option_data

# Example: Retrieving call option data for AAPL expiring on 2024-03-16 with a strike price of $150
aapl_call_data = get_option_data(
    "AAPL", "2024-03-22", "call", 150
)
# Accessing specific data points like bid price and implied volatility
aapl_call_bid_price = aapl_call_data["bid"]
aapl_call_implied_volatility = aapl_call_data["impliedVolatility"]

This code defines a function get_option_data that takes the stock symbol, expiration date, option type, and strike price as arguments. It then utilizes yf.Ticker.option_chain to retrieve the options chain for the specified date and filters for the desired option type and strike price. Finally, it returns a pandas DataFrame containing relevant data points like bid price, ask price, and implied volatility.

By combining historical price data for the underlying asset with real-time options data, we can gather the necessary information to embark on our AI-powered option strategy generation journey.

2. Quantitative Analysis of Options Data

Having acquired real-time options data using yfinance, we now delve into the realm of quantitative analysis. This involves extracting valuable insights from the data to inform our AI model development and ultimately, option strategy generation.

2.1 Feature Engineering for Machine Learning

In the context of AI-powered option strategy generation, we’ll transform the raw options data into features suitable for machine learning algorithms. This process, known as feature engineering, plays a crucial role in extracting meaningful information and enabling the model to learn effective relationships between features and the desired outcome (e.g., predicting price movement).

Here are some key examples of features we can engineer from options data:

Basic Features:

Strike price
Expiration date (days to expiry)
Option type (call/put)
Bid price
Ask price
Implied volatility

Derived Features:

Delta: Measures the rate of change of the option’s price relative to the underlying asset’s price.
Gamma: Measures the rate of change of delta.
Theta: Measures the rate of decay of an option’s price over time due to time decay.
Vega: Measures the sensitivity of an option’s price to changes in implied volatility.

Technical Indicators:

Moving averages, Bollinger Bands, Relative Strength Index (RSI), etc. (calculated using historical price data)

By incorporating these features, we provide the machine learning model with a comprehensive understanding of the options contract and its relationship with various market factors.

2.2 Calculating Option Greeks and Other Metrics

The option greeks (delta, gamma, theta, and vega) represent essential metrics that quantify the sensitivity of an option’s price to various factors. Calculating these greeks from the retrieved options data is crucial for feature engineering and understanding the option’s behavior.

Here’s an example of how you can calculate the delta using the Black-Scholes option pricing model:

from scipy.stats import norm

def calculate_delta(stock_price, strike_price, risk_free_rate, time_to_expiry, implied_volatility):
    """
    Calculates the delta of an option using the Black-Scholes model.

    Args:
        stock_price (float): Current price of the underlying asset.
        strike_price (float): Strike price of the option contract.
        risk_free_rate (float): Risk-free interest rate.
        time_to_expiry (float): Time to expiry of the option in years.
        implied_volatility (float): Implied volatility of the option.

    Returns:
        float: The calculated delta value.
    """

    d1 = (np.log(stock_price / strike_price) + (risk_free_rate + 0.5 * implied_volatility**2) * time_to_expiry) / (implied_volatility * np.sqrt(time_to_expiry))
    return norm.cdf(d1)

# Calculating delta for AAPL call option
aapl_call_delta = calculate_delta(
    stock_price=150,
    strike_price=140,
    risk_free_rate=0.02,
    time_to_expiry=(datetime.datetime(year=2024, month=4, day=5) - datetime.datetime.now()).days / 365,
    implied_volatility=aapl_call_data["impliedVolatility"].iloc[0]
)

print(f"Delta of AAPL call option: {aapl_call_delta:.4f}")

This code snippet defines a function calculate_delta that implements the Black-Scholes formula to calculate the delta based on the provided parameters. It then calculates the delta for the AAPL call option data retrieved earlier and prints the result.

Calculating Gamma:

def calculate_gamma(stock_price, strike_price, risk_free_rate, time_to_expiry, implied_volatility):
    """
    Calculates the gamma of an option using the Black-Scholes model.

    Args:
            stock_price (float): Current price of the underlying asset.
            strike_price (float): Strike price of the option contract.
            risk_free_rate (float): Risk-free interest rate.
            time_to_expiry (float): Time to expiry of the option in years.
            implied_volatility (float): Implied volatility of the option.
        Returns:
            float: The calculated gamma value.
        """
        d1 = (np.log(stock_price / strike_price) + (risk_free_rate + 0.5 * implied_volatility**2) * time_to_expiry) / (implied_volatility * np.sqrt(time_to_expiry))
        d2 = d1 - implied_volatility * np.sqrt(time_to_expiry)
        gamma = norm.pdf(d1) / (stock_price * implied_volatility * np.sqrt(time_to_expiry))
        return gamma

# Calculating gamma for AAPL call option
aapl_call_gamma = calculate_gamma(
    stock_price=150,
    strike_price=140,
    risk_free_rate=0.02,
    time_to_expiry=(datetime.datetime(year=2024, month=4, day=5) - datetime.datetime.now()).days / 365,
    implied_volatility=aapl_call_data["impliedVolatility"].iloc[0]
)
print(f"Gamma of AAPL call option: {aapl_call_gamma:.4f}")

This code defines a function calculate_gamma that calculates the gamma of an option using the Black-Scholes model. It then calculates the gamma for the AAPL call option data and prints the result.

Calculating Theta:

def calculate_theta(stock_price, strike_price, risk_free_rate, time_to_expiry, implied_volatility):
    """
    Calculates the theta of an option using the Black-Scholes model.

    Args:
            stock_price (float): Current price of the underlying asset.
            strike_price (float): Strike price of the option contract.
            risk_free_rate (float): Risk-free interest rate.
            time_to_expiry (float): Time to expiry of the option in years.
            implied_volatility (float): Implied volatility of the option.
        Returns:
            float: The calculated theta value.
        """
        d1 = (np.log(stock_price / strike_price) + (risk_free_rate + 0.5 * implied_volatility**2) * time_to_expiry) / (implied_volatility * np.sqrt(time_to_expiry))
        d2 = d1 - implied_volatility * np.sqrt(time_to_expiry)
        theta = -(stock_price * norm.pdf(d1) * implied_volatility) / (2 * np.sqrt(time_to_expiry)) - risk_free_rate * strike_price * np.exp(-risk_free_rate * time_to_expiry) * norm.cdf(d2)
        return theta

# Calculating theta for AAPL call option
aapl_call_theta = calculate_theta(
    stock_price=150,
    strike_price=140,
    risk_free_rate=0.02,
    time_to_expiry=(datetime.datetime(year=2024, month=4, day=5) - datetime.datetime.now()).days / 365,
    implied_volatility=aapl_call_data["impliedVolatility"].iloc[0]
)
print(f"Theta of AAPL call option: {aapl_call_theta:.4f}")

This snippet introduces a function calculate_theta to determine the theta of an option using the Black-Scholes model. It then calculates the theta for the AAPL call option data and prints the result.

Calculating Vega:

def calculate_vega(stock_price, strike_price, risk_free_rate, time_to_expiry, implied_volatility):
    """
    Calculates the vega of an option using the Black-Scholes model.

    Args:
            stock_price (float): Current price of the underlying asset.
            strike_price (float): Strike price of the option contract.
            risk_free_rate (float): Risk-free interest rate.
            time_to_expiry (float): Time to expiry of the option in years.
            implied_volatility (float): Implied volatility of the option.
        Returns:
            float: The calculated vega value.
        """
        d1 = (np.log(stock_price / strike_price) + (risk_free_rate + 0.5 * implied_volatility**2) * time_to_expiry) / (implied_volatility * np.sqrt(time_to_expiry))
        vega = stock_price * norm.pdf(d1) * np.sqrt(time_to_expiry)
        return vega

# Calculating vega for AAPL call option
aapl_call_vega = calculate_vega(
    stock_price=150,
    strike_price=140,
    risk_free_rate=0.02,
    time_to_expiry=(datetime.datetime(year=2024, month=4, day=5) - datetime.datetime.now()).days / 365,
    implied_volatility=aapl_call_data["impliedVolatility"].iloc[0]
)
print(f"Vega of AAPL call option: {aapl_call_vega:.4f}")

This section introduces a function calculate_vega for computing the vega of an option using the Black-Scholes model. The vega for the AAPL call option is then calculated and printed.

In the next part of our tutorial, we will explore how to integrate these calculated metrics into a comprehensive feature set for training a machine learning model. This will set the stage for developing an AI-powered option strategy generator.

3. Model Development with Keras

In this section, we will delve into the process of developing our machine learning model using the powerful Keras library. Our objective is to leverage the calculated option greeks and other relevant features to predict the future behavior of options prices. We will use the JPMorgan Chase & Co. (JPM) ticker and consider different expiry dates to build a comprehensive dataset.

3.1 Building the Deep Learning Dataset

Before diving into model development, it’s crucial to create a well-structured dataset. We will use the code provided below to build a dataset that includes option information for call contracts of JPM with various expiry dates.

import pandas as pd
import yfinance as yf
import numpy as np
from scipy.stats import norm

def calculate_option_greeks(row):
    try:
        d1 = (np.log(row['lastPrice'] / row['strike']) + (0.02 + 0.5 * row['impliedVolatility']**2) * row['time_to_expiry']) / (row['impliedVolatility'] * np.sqrt(row['time_to_expiry']))
        
        delta = norm.cdf(d1)
        gamma = norm.pdf(d1) / (row['lastPrice'] * row['impliedVolatility'] * np.sqrt(row['time_to_expiry']))
        
        d2 = d1 - row['impliedVolatility'] * np.sqrt(row['time_to_expiry'])
        theta = -(row['lastPrice'] * norm.pdf(d1) * row['impliedVolatility']) / (2 * np.sqrt(row['time_to_expiry'])) - 0.02 * row['strike'] * np.exp(-0.02 * row['time_to_expiry']) * norm.cdf(d2)
        
        vega = row['lastPrice'] * norm.pdf(d1) * np.sqrt(row['time_to_expiry'])
        
        return delta, gamma, theta, vega
    except Exception as e:
        print(f"Error calculating greeks: {e}")
        raise
        return np.nan, np.nan, np.nan, np.nan


# Initialize the yfinance Ticker object for JPM
jpm_ticker = yf.Ticker(ticker_symbol)

# Get available expiration dates
expiration_dates = jpm_ticker.options

# Initialize an empty DataFrame to store the dataset
options_dataset = pd.DataFrame()

# Iterate over expiration dates and build the dataset using only calls data
for expiration_date in expiration_dates:
    # Download the options data for JPM for the current expiration date
    jpm_options = jpm_ticker.option_chain(expiration_date)
    
    # Use only the calls data
    jpm_calls = jpm_options.calls
    
    # Add the expiration date as a column
    jpm_calls['expiration_date'] = expiration_date
    
    # Calculate Moneyness
    jpm_calls['Moneyness'] = jpm_calls['lastPrice'] / jpm_calls['strike']
    jpm_calls['time_to_expiry'] = (pd.to_datetime(expiration_date) - pd.Timestamp.now()).days
    # Calculate option greeks using the provided function
    jpm_calls[['delta', 'gamma', 'theta', 'vega']] = jpm_calls.apply(calculate_option_greeks, axis=1, result_type='expand')

    
    # Append to the overall dataset
    options_dataset = pd.concat([options_dataset, jpm_calls], ignore_index=True)

# Display the first few rows of the dataset
options_dataset.head()

This code initializes a Ticker object for JPM using yfinance, retrieves available expiration dates for options, and iteratively collects call option data for each expiration date. The resulting dataset includes calculated greeks (delta, gamma, theta, vega), moneyness, time to expiration, and other relevant features.

In the subsequent sections, we will use this dataset to train a deep learning model for option price prediction.

3.2 Build Dataset and Traind Deep Learning Model

Now that we have acquired and quantitatively analyzed options data for JPMorgan (JPM) using the yfinance library, the next step is to develop a machine learning model using Keras. Our goal is to leverage deep learning techniques to predict option prices and facilitate the generation of effective option trading strategies.

Let’s proceed with the model development, incorporating the calculated option greeks and other relevant features into our dataset. We will utilize a neural network architecture for regression, as predicting option prices is a continuous task.

import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense, Dropout
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Define the features for the model
features = ['strike', 'Moneyness', 'time_to_expiry', 'impliedVolatility', 'delta', 'gamma', 'theta', 'vega']

# Select the target variable
target_variable = 'lastPrice'
options_dataset = options_dataset.dropna()

# Prepare the feature matrix and target variable
X = options_dataset[features].dropna()
y = options_dataset[target_variable]


# Standardize the feature values
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)


# Define the neural network model
model = Sequential()
model.add(Dense(128, input_dim=len(features), activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(1, activation='linear'))

# Compile the model
model.compile(optimizer='adam', loss='mean_squared_error')

# Train the model
model.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_test, y_test))

In this code block, we utilize the Keras library to build a neural network model for regression. The model consists of three layers with ReLU activation functions and dropout layers to prevent overfitting. The mean squared error is chosen as the loss function, and the Adam optimizer is employed for training.

Before training the model, we preprocess the data by standardizing the feature values and splitting the dataset into training and testing sets.

Now, let’s evaluate the model’s performance on the testing set and analyze its predictive capabilities:

# Evaluate the model on the testing set
mse = model.evaluate(X_test_scaled, y_test)
print(f'Mean Squared Error on Testing Set: {mse}')

# Predict option prices on the testing set
predictions = model.predict(X_test_scaled)

# Compare predictions with actual prices
comparison_df = pd.DataFrame({'Actual Price': y_test.values, 'Predicted Price': predictions.flatten()})
print(comparison_df.head())

This code block assesses the model’s performance on the testing set by calculating the Mean Squared Error (MSE). Additionally, it provides a comparison between the actual and predicted option prices.

loss: 31.2149 
Mean Squared Error on Testing Set: 33.98479461669922

   Actual Price  Predicted Price
0         20.05        19.758265
1          0.07        -0.051028
2          8.42         8.334528
3         14.70        10.833565
4         19.33        19.095814

By developing and training a neural network on historical options data, we aim to create a model capable of predicting option prices.

3.3 Visualizization

For visualization we will use matplotlib and seaborn. We will visualize the training and validation loss over epochs, providing insights into the model’s learning process.

import matplotlib.pyplot as plt

# Plot training and validation loss over epochs
plt.figure(figsize=(12, 6))
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Training and Validation Loss Over Epochs')
plt.xlabel('Epochs')
plt.ylabel('Mean Squared Error')
plt.legend()
plt.show()

The plot illustrates the mean squared error (MSE) for both the training and validation sets over epochs during the model training. A convergence of the error indicates that the model has learned to generalize well to unseen data.

Now, let’s visualize the scatter plot between predicted and actual option prices on the testing set.

import seaborn as sns
import matplotlib.pyplot as plt

# Scatter plot with a diagonal line for predictions vs. actual prices
plt.figure(figsize=(10, 8))
sns.scatterplot(x=y_test, y=predictions.flatten())
plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], linestyle='--', color='red', label='Diagonal Line')
plt.title('Scatter Plot of Predictions vs. Actual Prices')
plt.xlabel('Actual Prices')
plt.ylabel('Predicted Prices')
plt.legend()
plt.show()

The scatter plot above visually compares the predicted option prices against the actual prices on the testing set. The diagonal line represents a perfect match between predictions and actual values, so deviations from this line indicate the model’s predictive error.

Predictions vs Actual Prices. Created by Author.

4. Option Strategy Generation and Backtesting

Now that we have trained our deep learning model using historical options data for JPMorgan, we can proceed to generate option trading strategies and backtest their performance.

In this section, we will delve into the process of generating and backtesting an option trading strategy using the developed deep learning model. The provided code leverages the historical prices of JPMorgan (JPM) obtained through yfinance and aligns them with the previously created options_dataset. The strategy is based on signals generated by the model predictions, and the performance is assessed through visualization.

Let’s break down the code and explain each step:

# Get historical prices for JPM
jpm_prices = jpm_ticker.history(period='1y', interval='1d')['Close']

Here, we obtain historical daily closing prices for JPMorgan over the past year. This historical price data is crucial for backtesting our option trading strategy.

# Convert the options_dataset index to datetime and set the time zone
options_dataset.index = pd.to_datetime(options_dataset.index).tz_localize('America/New_York')

# Reindex jpm_prices with the options_dataset index
jpm_prices = jpm_prices.reindex(options_dataset.index, method='ffill')

This segment ensures that the indices of both datasets are properly aligned. We convert the index of the options_dataset to datetime and set the time zone to 'America/New_York'. Then, we reindex the historical prices of JPMorgan to match the indices of the options dataset using forward filling ('ffill') to handle any missing values.

# Calculate strategy returns
options_dataset['Predicted_Price'] = model.predict(scaler.transform(options_dataset[features]))
options_dataset['Signal'] = np.where(options_dataset['Predicted_Price'] > options_dataset['lastPrice'], 1, -1)

Now, we predict option prices using the trained model and scale the features accordingly. The strategy signal is generated based on a simple assumption: buy (hold) if the predicted price is higher than the current option price and sell (hold cash) otherwise.

# Assume a simple strategy: Buy (Hold) if Signal is 1, Sell (Hold Cash) if Signal is -1
options_dataset['Position'] = options_dataset['Signal'].shift(1)
options_dataset['Position'].fillna(0, inplace=True)  # Fill initial NaN value with 0 (Hold Cash)

This block of code defines the trading positions based on the generated signals. The Position column represents the trading position for each day: 1 for holding the option, -1 for holding cash, and 0 for no change in position.

# Calculate daily percentage returns based on the strategy
options_dataset['Strategy_Return'] = options_dataset['Position'] * options_dataset['lastPrice'].pct_change()
options_dataset['Cumulative_Strategy_Return'] = (1 + options_dataset['Strategy_Return']).cumprod() - 1

Here, we calculate the daily percentage returns of the strategy by multiplying the daily percentage change in the option price with the trading position. The cumulative strategy return is then computed, representing the overall performance of the strategy over time.

# Display the first few rows of the options dataset with strategy returns
print(options_dataset[['lastPrice', 'Predicted_Price', 'Signal', 'Position', 'Strategy_Return', 'Cumulative_Strategy_Return']].head())

Finally, the code displays the initial rows of the options dataset augmented with the strategy-related columns. This provides an insight into how the option prices, signals, positions, and strategy returns evolve over the specified period.

Conclusion and Future Directions

In this tutorial, we have embarked on a journey to develop an AI-powered option trading strategy using Python. We covered various crucial steps:

Data Acquisition: We utilized the yfinance library to acquire historical price data for JPMorgan and real-time options data. This data served as the foundation for our quantitative analysis.
Quantitative Analysis: We performed feature engineering to transform raw options data into features suitable for machine learning. Additionally, we calculated option greeks using the Black-Scholes model, providing essential metrics for understanding option behavior.
Model Development: A deep learning model was developed using the Keras library for predicting option prices. The model incorporated various features, including strike price, moneyness, time to expiry, implied volatility, and option greeks.
Option Strategy Generation and Backtesting: We generated option trading signals based on model predictions and backtested a simple strategy against historical JPMorgan stock prices. The strategy involved buying or holding options when predicted prices were expected to rise and holding cash otherwise.

5.1 Future Directions:

While our tutorial provides a solid foundation, there are several avenues for further exploration and improvement:

Model Enhancement: Consider experimenting with different neural network architectures, hyperparameter tuning, or exploring alternative machine learning algorithms to improve prediction accuracy.
Feature Engineering: Explore additional features or technical indicators that might enhance the model’s understanding of market dynamics. This could include sentiment analysis of financial news or incorporating macroeconomic indicators.
Risk Management: Integrate robust risk management techniques into the trading strategy. This might involve optimizing position sizes, setting stop-loss orders, or incorporating volatility-based risk metrics.
Live Trading Implementation: Transition the strategy from backtesting to live trading in a simulated environment. Ensure that the strategy is resilient to real-world market conditions and latency.
Dynamic Strategy Adaptation: Implement mechanisms for the strategy to adapt dynamically to changing market conditions. This could involve retraining the model periodically or incorporating reinforcement learning techniques.
Community and Collaboration: Engage with the trading and AI communities to share insights, gather feedback, and collaborate on improving the strategy. Platforms like GitHub provide an excellent space for open-source collaboration.

In conclusion, developing an AI-powered option trading strategy is a complex and dynamic process. It requires a combination of quantitative analysis, machine learning expertise, and a deep understanding of financial markets. As you continue your exploration in this domain, keep refining your models, adapting to market dynamics, and staying informed about the latest advancements in both AI and finance.