Question 03: Given a bank customer, build a neural network-based classifier that can determine whether they will leave or not in the next 6 months.
Dataset Description: It's 14 distinct features such as Customer Id, Credit Score, Geography, Gender, Age, Tenure, Balance, etc. Perform following steps:
1) Read the dataset.
2) Distinguish the feature and target set and divide the data set into training and test sets.
3) Normalize the train and test data.
4) Initialize and build the model. Identify the points of improvement and implement the same.
5) Print the accuracy score and confusion matrix.
Download hole Program / Project code, by clicking following link: What are the key steps to build a machine learning model for Email Spam Detection using binary classification ?
To build a neural network classifier for customer churn prediction, follow these steps:
- Read the Dataset Load the dataset using libraries like pandas and inspect its structure.
- Feature and Target Separation
- Features: All input columns like Credit Score, Geography, Age, etc.
- Target: A binary column indicating churn (0 = stay, 1 = leave).
- Train-Test Split Use train_test_split() to divide the dataset (e.g., 80% train, 20% test).
- Normalize Data Apply StandardScaler or MinMaxScaler to bring features to the same scale.
- Build the Neural Network
- Use libraries like TensorFlow/Keras or PyTorch.
- Design input, hidden, and output layers with appropriate activation functions (e.g., ReLU for hidden, Sigmoid for output).
- Compile using binary cross-entropy loss and an optimizer like Adam.
- Train the Model and Evaluate Train for several epochs and evaluate performance using:
- Accuracy Score
- Confusion Matrix to assess true/false positives and negatives.
Dataset Description: It's 14 distinct features such as Customer Id, Credit Score, Geography, Gender, Age, Tenure, Balance, etc. Perform following steps:
1) Read the dataset.
2) Distinguish the feature and target set and divide the data set into training and test sets.
3) Normalize the train and test data.
4) Initialize and build the model. Identify the points of improvement and implement the same.
5) Print the accuracy score and confusion matrix.
- Read the Dataset Load the dataset using libraries like pandas and inspect its structure.
- Feature and Target Separation
- Features: All input columns like Credit Score, Geography, Age, etc.
- Target: A binary column indicating churn (0 = stay, 1 = leave).
- Train-Test Split Use train_test_split() to divide the dataset (e.g., 80% train, 20% test).
- Normalize Data Apply StandardScaler or MinMaxScaler to bring features to the same scale.
- Build the Neural Network
- Use libraries like TensorFlow/Keras or PyTorch.
- Design input, hidden, and output layers with appropriate activation functions (e.g., ReLU for hidden, Sigmoid for output).
- Compile using binary cross-entropy loss and an optimizer like Adam.
- Train the Model and Evaluate Train for several epochs and evaluate performance using:
- Accuracy Score
- Confusion Matrix to assess true/false positives and negatives.
How do K-Nearest Neighbors (KNN) and Support Vector Machine (SVM) compare in performance for Email Spam Detection ?
You can improve the neural network’s performance using the following strategies:
- Hyperparameter Tuning
- Adjust number of layers, neurons per layer, learning rate, batch size, and number of epochs.
- Use grid search or random search for optimization.
- Dropout Regularization Add dropout layers to prevent overfitting by randomly disabling neurons during training.
- Feature Engineering
- Convert categorical features (Geography, Gender) into numerical using one-hot encoding.
- Derive new features from existing ones (e.g., age groups or customer tenure bands).
- Data Balancing If the target variable is imbalanced (e.g., more “stay” than “leave” examples), apply:
- SMOTE (Synthetic Minority Oversampling Technique)
- Class weight adjustments during model training
- Early StoppingUse callbacks to stop training when validation performance no longer improves.
These enhancements can significantly increase the model's ability to generalize and accurately predict churn behavior.
- Hyperparameter Tuning
- Adjust number of layers, neurons per layer, learning rate, batch size, and number of epochs.
- Use grid search or random search for optimization.
- Dropout Regularization Add dropout layers to prevent overfitting by randomly disabling neurons during training.
- Feature Engineering
- Convert categorical features (Geography, Gender) into numerical using one-hot encoding.
- Derive new features from existing ones (e.g., age groups or customer tenure bands).
- Data Balancing If the target variable is imbalanced (e.g., more “stay” than “leave” examples), apply:
- SMOTE (Synthetic Minority Oversampling Technique)
- Class weight adjustments during model training
- Early StoppingUse callbacks to stop training when validation performance no longer improves.
Programming Code: Following code write in: ML_P03.py # ML Project Program 03
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
bank_data = pd.read_csv("./Churn_Modelling_dataset/Churn_Modelling.csv")
bank_data
bank_data.info()
bank_data.describe()
# Exploratory Data Analysis
bank_data.shape
# checking null values
bank_data.isnull().sum()
# Drop the columns which are unique for all users like ID
new_bank_data = bank_data.drop(['RowNumber', 'CustomerId', 'Surname'], axis = 1)
new_bank_data.info()
import matplotlib.pyplot as plt
# Distribution of Numerical Column
num_cal_dis = ['CreditScore', 'Age', 'Tenure', 'Balance', 'EstimatedSalary']
for i in num_cal_dis:
plt.hist(bank_data[i])
plt.title(i)
plt.show()
# Model Training
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
categorical_feature_mask = new_bank_data.dtypes == object
categorical_cols = new_bank_data.columns[categorical_feature_mask].tolist()
new_bank_data[categorical_cols] = new_bank_data[categorical_cols].apply(lambda col: le.fit_transform(col))
new_bank_data.info()
from scipy.stats import zscore
df_scaled = new_bank_data.apply(zscore)
x_col = df_scaled.columns.tolist()[1:10]
y_col = new_bank_data.columns.tolist()[-1:]
x = df_scaled[x_col].values
y = np.array(new_bank_data['Exited'])
x, y
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2)
x_train.shape, y_train.shape
from tensorflow.keras.utils import to_categorical
# One Hot Encoding (Encoding output class label)
y_train = to_categorical(y_train, 2)
y_test = to_categorical(y_test, 2)
# Normalize data
from sklearn.preprocessing import Normalizer
normalize = Normalizer(norm = "max")
x_train = normalize.transform(x_train)
x_train
x_test = normalize.transform(x_test)
x_test
import tensorflow as tf
# Building Machine Learning Model
model = tf.keras.Sequential()
from tensorflow.keras.layers import Dense
model.add(Dense(units = 6, activation = 'relu', input_shape = (9, )))
model.add(Dense(20, activation = 'relu'))
model.add(Dense(2, activation = 'softmax'))
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
model.summary()
history = model.fit(x_train, y_train, batch_size= 45, epochs=200, validation_data=(x_test, y_test))
# Checking Train & Test Accuracy
test_score = model.evaluate(x_test, y_test, verbose=1)
train_score = model.evaluate(x_train, y_train, verbose=1)
test_score, train_score
y_pred = model.predict(x_test)
y_pred = (y_pred > 0.5)
# Testing the Neural Network
plt.plot(np.array(history.history['accuracy']) * 100) # 100%
plt.plot(np.array(history.history['val_accuracy']) * 100)
plt.ylabel('accuracy')
plt.xlabel('epochs')
plt.legend(['train', 'validation'])
plt.title('Accuracy over epochs')
plt.show()
# Model Accuracy
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_pred)
# Model Accuracy is : 86%
# Thanks For Reading.
Output:
# ML Project Program 03 import pandas as pd import numpy as np from sklearn.model_selection import train_test_split bank_data = pd.read_csv("./Churn_Modelling_dataset/Churn_Modelling.csv") bank_data bank_data.info() bank_data.describe() # Exploratory Data Analysis bank_data.shape # checking null values bank_data.isnull().sum() # Drop the columns which are unique for all users like ID new_bank_data = bank_data.drop(['RowNumber', 'CustomerId', 'Surname'], axis = 1) new_bank_data.info() import matplotlib.pyplot as plt # Distribution of Numerical Column num_cal_dis = ['CreditScore', 'Age', 'Tenure', 'Balance', 'EstimatedSalary'] for i in num_cal_dis: plt.hist(bank_data[i]) plt.title(i) plt.show() # Model Training from sklearn.preprocessing import LabelEncoder le = LabelEncoder() categorical_feature_mask = new_bank_data.dtypes == object categorical_cols = new_bank_data.columns[categorical_feature_mask].tolist() new_bank_data[categorical_cols] = new_bank_data[categorical_cols].apply(lambda col: le.fit_transform(col)) new_bank_data.info() from scipy.stats import zscore df_scaled = new_bank_data.apply(zscore) x_col = df_scaled.columns.tolist()[1:10] y_col = new_bank_data.columns.tolist()[-1:] x = df_scaled[x_col].values y = np.array(new_bank_data['Exited']) x, y x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2) x_train.shape, y_train.shape from tensorflow.keras.utils import to_categorical # One Hot Encoding (Encoding output class label) y_train = to_categorical(y_train, 2) y_test = to_categorical(y_test, 2) # Normalize data from sklearn.preprocessing import Normalizer normalize = Normalizer(norm = "max") x_train = normalize.transform(x_train) x_train x_test = normalize.transform(x_test) x_test import tensorflow as tf # Building Machine Learning Model model = tf.keras.Sequential() from tensorflow.keras.layers import Dense model.add(Dense(units = 6, activation = 'relu', input_shape = (9, ))) model.add(Dense(20, activation = 'relu')) model.add(Dense(2, activation = 'softmax')) model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy']) model.summary() history = model.fit(x_train, y_train, batch_size= 45, epochs=200, validation_data=(x_test, y_test)) # Checking Train & Test Accuracy test_score = model.evaluate(x_test, y_test, verbose=1) train_score = model.evaluate(x_train, y_train, verbose=1) test_score, train_score y_pred = model.predict(x_test) y_pred = (y_pred > 0.5) # Testing the Neural Network plt.plot(np.array(history.history['accuracy']) * 100) # 100% plt.plot(np.array(history.history['val_accuracy']) * 100) plt.ylabel('accuracy') plt.xlabel('epochs') plt.legend(['train', 'validation']) plt.title('Accuracy over epochs') plt.show() # Model Accuracy from sklearn.metrics import accuracy_score accuracy_score(y_test, y_pred) # Model Accuracy is : 86% # Thanks For Reading.
Output:
0 Comments