Question 7: Use Share Market dataset to predict ups and downs in the market and predict future stock price returns based on Indian Market data from 2000 to 2020. (Mini Project) Download hole Program / Project code, by clicking following link: How can you use machine learning to predict stock market ups and downs using Indian market data (2000–2020) ?
To predict market movement (ups or downs), you can treat the problem as a classification task:
- Load the dataset and clean it — handle missing values, correct data types.
- Feature Engineering: Create indicators like moving averages, RSI, MACD, returns, and volatility.
- Label Creation: Generate a binary column — e.g., if Close(t+1) > Close(t) then label = 1 (Up), else 0 (Down).
- Split the dataset into train and test sets.
- Apply classification algorithms like:
- Logistic Regression
- Random Forest
- XGBoost
- Evaluate models using accuracy, precision, recall, and AUC.
This enables the prediction of directional movement in stock prices.
- Load the dataset and clean it — handle missing values, correct data types.
- Feature Engineering: Create indicators like moving averages, RSI, MACD, returns, and volatility.
- Label Creation: Generate a binary column — e.g., if Close(t+1) > Close(t) then label = 1 (Up), else 0 (Down).
- Split the dataset into train and test sets.
- Apply classification algorithms like:
- Logistic Regression
- Random Forest
- XGBoost
- Evaluate models using accuracy, precision, recall, and AUC.
How can regression models be used to predict future stock returns, and what are the key evaluation metrics ?
To predict future stock returns, treat the problem as a regression task:
- Preprocess historical stock data (e.g., NIFTY, BSE) from 2000–2020.
- Calculate target: Future return = (Close(t+5) - Close(t)) / Close(t).
- Features may include:
- Price lag values
- Volume
- Technical indicators (e.g., SMA, EMA)
- Train regression models:
- Linear Regression
- Random Forest Regressor
- LSTM (optional for time series deep learning)
- Evaluate the models using:
- R² Score
- RMSE (Root Mean Squared Error)
- MAE (Mean Absolute Error)
These metrics help assess the accuracy and reliability of stock return predictions.
- Preprocess historical stock data (e.g., NIFTY, BSE) from 2000–2020.
- Calculate target: Future return = (Close(t+5) - Close(t)) / Close(t).
- Features may include:
- Price lag values
- Volume
- Technical indicators (e.g., SMA, EMA)
- Train regression models:
- Linear Regression
- Random Forest Regressor
- LSTM (optional for time series deep learning)
- Evaluate the models using:
- R² Score
- RMSE (Root Mean Squared Error)
- MAE (Mean Absolute Error)
Programming Code: Following code write in: ML_P07.py # ML Project Program 07
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error, r2_score
nRowRead = 1000
data = pd.read_csv('./Indian_stock_market_dataset/Indian_stock_market.csv', delimiter=',', nrows = nRowRead)
data.dataframeName = 'Indian_stock_market.csv'
nRow, nCol = data.shape
data.info()
data.describe()
data.columns
# check null values
data.isnull().sum()
data['Adj Close'].plot()
# set Target variable
output_var = pd.DataFrame(data['Adj Close'])
# selecting the Features
features = ["Open", "High", "Low", "Volume"]
# Normalizing Dataset
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
feature_transform = scaler.fit_transform(data[features])
feature_transform = pd.DataFrame(columns = features, data = feature_transform, index = data.index)
feature_transform.info()
from sklearn.model_selection import TimeSeriesSplit
time_split = TimeSeriesSplit(n_splits = 10)
for train_index, test_index in time_split.split(feature_transform):
x_train, x_test = feature_transform[:len(train_index)], feature_transform[len(train_index): (len(train_index) + len(test_index))]
y_train, y_test = output_var[:len(train_index)].values.ravel(), output_var[len(train_index): (len(train_index) + len(test_index))].values.ravel()
# Process the data for LSTM
trainX = np.array(x_train)
testX = np.array(x_test)
x_train = trainX.reshape(x_train.shape[0],1, x_train.shape[1])
x_test = testX.reshape(x_test.shape[0], 1, x_test.shape[1])
from keras.models import Sequential
from keras.layers import Dense, LSTM
from keras.utils.vis_utils import plot_model
# Building LSTM Model
# for LSTM 1 input
lstm = Sequential()
lstm.add(LSTM(32, input_shape=(1, trainX.shape[1]), activation="relu", return_sequences = False))
lstm.add(Dense(1))
lstm.compile(loss = "mean_squared_error", optimizer = "adam")
plot_model(lstm, show_shapes = True, show_layer_names = True)
# Building LSTM Model
# for LSTM 2 input
lstm = Sequential()
lstm.add(LSTM(32, input_shape=(1, trainX.shape[1]), activation="relu", return_sequences = False))
lstm.add(Dense(1))
lstm.compile(loss = "mean_squared_error", optimizer = "adam")
plot_model(lstm, show_shapes = True, show_layer_names = True)
history = lstm.fit(x_train, y_train, epochs = 100, batch_size = 8, verbose = 1, shuffle = False)
# LSTM Prediction
y_pred = lstm.predict(x_test)
# Prediction vs True Adj Close Value - LSTM
plt.plot(y_test, label = "True Vale")
plt.plot(y_pred, label = "LSTM Value")
plt.title("Prediction by LSTM Model")
plt.xlabel("Time Scale")
plt.ylabel("Scaled USD")
plt.legend()
plt.show()
from sklearn import metrics
# R2 Score
metrics.r2_score(y_test, y_pred)
# R2 score is : 0.87
# Thanks For Reading.
Output:
# ML Project Program 07 import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.metrics import mean_squared_error, r2_score nRowRead = 1000 data = pd.read_csv('./Indian_stock_market_dataset/Indian_stock_market.csv', delimiter=',', nrows = nRowRead) data.dataframeName = 'Indian_stock_market.csv' nRow, nCol = data.shape data.info() data.describe() data.columns # check null values data.isnull().sum() data['Adj Close'].plot() # set Target variable output_var = pd.DataFrame(data['Adj Close']) # selecting the Features features = ["Open", "High", "Low", "Volume"] # Normalizing Dataset from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler() feature_transform = scaler.fit_transform(data[features]) feature_transform = pd.DataFrame(columns = features, data = feature_transform, index = data.index) feature_transform.info() from sklearn.model_selection import TimeSeriesSplit time_split = TimeSeriesSplit(n_splits = 10) for train_index, test_index in time_split.split(feature_transform): x_train, x_test = feature_transform[:len(train_index)], feature_transform[len(train_index): (len(train_index) + len(test_index))] y_train, y_test = output_var[:len(train_index)].values.ravel(), output_var[len(train_index): (len(train_index) + len(test_index))].values.ravel() # Process the data for LSTM trainX = np.array(x_train) testX = np.array(x_test) x_train = trainX.reshape(x_train.shape[0],1, x_train.shape[1]) x_test = testX.reshape(x_test.shape[0], 1, x_test.shape[1]) from keras.models import Sequential from keras.layers import Dense, LSTM from keras.utils.vis_utils import plot_model # Building LSTM Model # for LSTM 1 input lstm = Sequential() lstm.add(LSTM(32, input_shape=(1, trainX.shape[1]), activation="relu", return_sequences = False)) lstm.add(Dense(1)) lstm.compile(loss = "mean_squared_error", optimizer = "adam") plot_model(lstm, show_shapes = True, show_layer_names = True) # Building LSTM Model # for LSTM 2 input lstm = Sequential() lstm.add(LSTM(32, input_shape=(1, trainX.shape[1]), activation="relu", return_sequences = False)) lstm.add(Dense(1)) lstm.compile(loss = "mean_squared_error", optimizer = "adam") plot_model(lstm, show_shapes = True, show_layer_names = True) history = lstm.fit(x_train, y_train, epochs = 100, batch_size = 8, verbose = 1, shuffle = False) # LSTM Prediction y_pred = lstm.predict(x_test) # Prediction vs True Adj Close Value - LSTM plt.plot(y_test, label = "True Vale") plt.plot(y_pred, label = "LSTM Value") plt.title("Prediction by LSTM Model") plt.xlabel("Time Scale") plt.ylabel("Scaled USD") plt.legend() plt.show() from sklearn import metrics # R2 Score metrics.r2_score(y_test, y_pred) # R2 score is : 0.87 # Thanks For Reading.
Output:
0 Comments