Ad Code

Responsive Advertisement

Machine Learning Program / Project - 03

Question 03: Given a bank customer, build a neural network-based classifier that can determine whether they will leave or not in the next 6 months.
Dataset Description: It's 14 distinct features such as Customer Id, Credit Score, Geography, Gender, Age, Tenure, Balance, etc. Perform following steps:
1) Read the dataset.
2) Distinguish the feature and target set and divide the data set into training and test sets.
3) Normalize the train and test data.
4) Initialize and build the model. Identify the points of improvement and implement the same.
5) Print the accuracy score and confusion matrix.

Download hole Program / Project code, by clicking following link:
What are the key steps to build a machine learning model for Email Spam Detection using binary classification ?
To build a neural network classifier for customer churn prediction, follow these steps:
  1. Read the Dataset Load the dataset using libraries like pandas and inspect its structure.
  2. Feature and Target Separation
    • Features: All input columns like Credit Score, Geography, Age, etc.
    • Target: A binary column indicating churn (0 = stay, 1 = leave).
  3. Train-Test Split Use train_test_split() to divide the dataset (e.g., 80% train, 20% test).
  4. Normalize Data Apply StandardScaler or MinMaxScaler to bring features to the same scale.
  5. Build the Neural Network
    • Use libraries like TensorFlow/Keras or PyTorch.
    • Design input, hidden, and output layers with appropriate activation functions (e.g., ReLU for hidden, Sigmoid for output).
    • Compile using binary cross-entropy loss and an optimizer like Adam.
  6. Train the Model and Evaluate Train for several epochs and evaluate performance using:
    • Accuracy Score
    • Confusion Matrix to assess true/false positives and negatives.

How do K-Nearest Neighbors (KNN) and Support Vector Machine (SVM) compare in performance for Email Spam Detection ?
You can improve the neural network’s performance using the following strategies:
  1. Hyperparameter Tuning
    • Adjust number of layers, neurons per layer, learning rate, batch size, and number of epochs.
    • Use grid search or random search for optimization.
  2. Dropout Regularization Add dropout layers to prevent overfitting by randomly disabling neurons during training.
  3. Feature Engineering
    • Convert categorical features (Geography, Gender) into numerical using one-hot encoding.
    • Derive new features from existing ones (e.g., age groups or customer tenure bands).
  4. Data Balancing If the target variable is imbalanced (e.g., more “stay” than “leave” examples), apply:
    • SMOTE (Synthetic Minority Oversampling Technique)
    • Class weight adjustments during model training
  5. Early StoppingUse callbacks to stop training when validation performance no longer improves.
These enhancements can significantly increase the model's ability to generalize and accurately predict churn behavior.

Programming Code:
Following code write in: ML_P03.py
# ML Project Program 03 

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
bank_data = pd.read_csv("./Churn_Modelling_dataset/Churn_Modelling.csv")
bank_data
bank_data.info()
bank_data.describe()
# Exploratory Data Analysis
bank_data.shape
# checking null values 
bank_data.isnull().sum()
# Drop the columns which are unique for all users like ID

new_bank_data = bank_data.drop(['RowNumber', 'CustomerId', 'Surname'], axis = 1)
new_bank_data.info()
import matplotlib.pyplot as plt

# Distribution of Numerical Column
num_cal_dis = ['CreditScore', 'Age', 'Tenure', 'Balance', 'EstimatedSalary']
for i in num_cal_dis:
    plt.hist(bank_data[i])
    plt.title(i)
    plt.show()
# Model Training
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()

categorical_feature_mask = new_bank_data.dtypes == object
categorical_cols = new_bank_data.columns[categorical_feature_mask].tolist()
new_bank_data[categorical_cols] = new_bank_data[categorical_cols].apply(lambda col: le.fit_transform(col))

new_bank_data.info()
from scipy.stats import zscore

df_scaled = new_bank_data.apply(zscore)
x_col = df_scaled.columns.tolist()[1:10]
y_col = new_bank_data.columns.tolist()[-1:]

x = df_scaled[x_col].values
y = np.array(new_bank_data['Exited'])

x, y
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2)

x_train.shape, y_train.shape
from tensorflow.keras.utils import to_categorical

# One Hot Encoding (Encoding output class label)
y_train = to_categorical(y_train, 2)
y_test = to_categorical(y_test, 2)
# Normalize data

from sklearn.preprocessing import Normalizer

normalize = Normalizer(norm = "max")
x_train = normalize.transform(x_train)

x_train
x_test = normalize.transform(x_test)
x_test
import tensorflow as tf

# Building Machine Learning Model
model = tf.keras.Sequential()
from tensorflow.keras.layers import Dense

model.add(Dense(units = 6, activation = 'relu', input_shape = (9, )))
model.add(Dense(20, activation = 'relu'))
model.add(Dense(2, activation = 'softmax'))
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
model.summary()
history = model.fit(x_train, y_train, batch_size= 45, epochs=200, validation_data=(x_test, y_test))
# Checking Train & Test Accuracy

test_score = model.evaluate(x_test, y_test, verbose=1)

train_score = model.evaluate(x_train, y_train, verbose=1)

test_score, train_score


y_pred = model.predict(x_test)

y_pred = (y_pred > 0.5)
# Testing the Neural Network
plt.plot(np.array(history.history['accuracy']) * 100) # 100%

plt.plot(np.array(history.history['val_accuracy']) * 100)
plt.ylabel('accuracy')
plt.xlabel('epochs')
plt.legend(['train', 'validation'])
plt.title('Accuracy over epochs')
plt.show()
# Model Accuracy
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred)
# Model Accuracy is : 86%

# Thanks For Reading.
Output:

Post a Comment

0 Comments

Ad Code

Responsive Advertisement