Ad Code

Responsive Advertisement

Machine Learning Program / Project - 07

Question 7: Use Share Market dataset to predict ups and downs in the market and predict future stock price returns based on Indian Market data from 2000 to 2020. (Mini Project)

Download hole Program / Project code, by clicking following link:
How can you use machine learning to predict stock market ups and downs using Indian market data (2000–2020) ?
To predict market movement (ups or downs), you can treat the problem as a classification task:
  1. Load the dataset and clean it — handle missing values, correct data types.
  2. Feature Engineering: Create indicators like moving averages, RSI, MACD, returns, and volatility.
  3. Label Creation: Generate a binary column — e.g., if Close(t+1) > Close(t) then label = 1 (Up), else 0 (Down).
  4. Split the dataset into train and test sets.
  5. Apply classification algorithms like:
    • Logistic Regression
    • Random Forest
    • XGBoost
  6. Evaluate models using accuracy, precision, recall, and AUC.
This enables the prediction of directional movement in stock prices.

How can regression models be used to predict future stock returns, and what are the key evaluation metrics ?
To predict future stock returns, treat the problem as a regression task:
  1. Preprocess historical stock data (e.g., NIFTY, BSE) from 2000–2020.
  2. Calculate target: Future return = (Close(t+5) - Close(t)) / Close(t).
  3. Features may include:
    • Price lag values
    • Volume
    • Technical indicators (e.g., SMA, EMA)
  4. Train regression models:
    • Linear Regression
    • Random Forest Regressor
    • LSTM (optional for time series deep learning)
  5. Evaluate the models using:
    • R² Score
    • RMSE (Root Mean Squared Error)
    • MAE (Mean Absolute Error)
These metrics help assess the accuracy and reliability of stock return predictions.

Programming Code:
Following code write in: ML_P07.py
# ML Project Program 07 

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error, r2_score
nRowRead = 1000
data = pd.read_csv('./Indian_stock_market_dataset/Indian_stock_market.csv', delimiter=',', nrows = nRowRead)
data.dataframeName = 'Indian_stock_market.csv'
nRow, nCol = data.shape
data.info()
data.describe()
data.columns
# check null values

data.isnull().sum()
data['Adj Close'].plot()
# set Target variable
output_var = pd.DataFrame(data['Adj Close'])

# selecting the Features
features = ["Open", "High", "Low", "Volume"]
# Normalizing Dataset

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
feature_transform = scaler.fit_transform(data[features])
feature_transform = pd.DataFrame(columns = features, data = feature_transform, index = data.index)
feature_transform.info()
from sklearn.model_selection import TimeSeriesSplit
time_split = TimeSeriesSplit(n_splits = 10)

for train_index, test_index in time_split.split(feature_transform):
    x_train, x_test = feature_transform[:len(train_index)], feature_transform[len(train_index): (len(train_index) + len(test_index))]
    y_train, y_test = output_var[:len(train_index)].values.ravel(), output_var[len(train_index): (len(train_index) + len(test_index))].values.ravel()
    
# Process the data for LSTM

trainX = np.array(x_train)
testX = np.array(x_test)
x_train = trainX.reshape(x_train.shape[0],1, x_train.shape[1])
x_test = testX.reshape(x_test.shape[0], 1, x_test.shape[1])
from keras.models import Sequential
from keras.layers import Dense, LSTM
from keras.utils.vis_utils import plot_model

# Building LSTM Model
# for LSTM 1 input

lstm = Sequential()
lstm.add(LSTM(32, input_shape=(1, trainX.shape[1]), activation="relu", return_sequences = False))
lstm.add(Dense(1))
lstm.compile(loss = "mean_squared_error", optimizer = "adam")
plot_model(lstm, show_shapes = True, show_layer_names = True)
# Building LSTM Model
# for LSTM 2 input
lstm = Sequential()
lstm.add(LSTM(32, input_shape=(1, trainX.shape[1]), activation="relu", return_sequences = False))
lstm.add(Dense(1))
lstm.compile(loss = "mean_squared_error", optimizer = "adam")
plot_model(lstm, show_shapes = True, show_layer_names = True)
history = lstm.fit(x_train, y_train, epochs = 100, batch_size = 8, verbose = 1, shuffle = False)

# LSTM Prediction
y_pred = lstm.predict(x_test)
# Prediction vs True Adj Close Value - LSTM

plt.plot(y_test, label = "True Vale")
plt.plot(y_pred, label = "LSTM Value")
plt.title("Prediction by LSTM Model")
plt.xlabel("Time Scale")
plt.ylabel("Scaled USD")
plt.legend()
plt.show()
from sklearn import metrics

# R2 Score
metrics.r2_score(y_test, y_pred)
# R2 score is : 0.87

# Thanks For Reading.
Output:

Post a Comment

0 Comments

Ad Code

Responsive Advertisement