Question 05: Implement K-Nearest Neighbors algorithm on diabetes.csv dataset. Compute confusion matrix, accuracy, error rate, precision and recall on the given dataset. Download hole Program / Project code, by clicking following link: How can the K-Nearest Neighbors (KNN) algorithm be used to classify diabetes data, and what performance metrics should be calculated ?
To apply it on the diabetes.csv dataset:
- Preprocess the data handle missing values, normalize features.
- Split the data into training and test sets.
- Fit the KNN model using training data.
- Predict on test data using the model.
- Evaluate the model using:
- Confusion Matrix to count TP, FP, FN, TN.
- Accuracy = (TP + TN) / (TP + FP + FN + TN)
- Error Rate = 1 - Accuracy
- Precision = TP / (TP + FP)
- Recall = TP / (TP + FN)
These metrics help determine how well the model classifies diabetes-positive and -negative cases.
- Preprocess the data handle missing values, normalize features.
- Split the data into training and test sets.
- Fit the KNN model using training data.
- Predict on test data using the model.
- Evaluate the model using:
- Confusion Matrix to count TP, FP, FN, TN.
- Accuracy = (TP + TN) / (TP + FP + FN + TN)
- Error Rate = 1 - Accuracy
- Precision = TP / (TP + FP)
- Recall = TP / (TP + FN)
What are the steps to compute confusion matrix, accuracy, precision, and recall for a KNN model applied on the diabetes dataset ?
- Train/Test Split: Divide the dataset into training (e.g., 80%) and test (20%) sets.
- Model Training: Fit the KNeighborsClassifier on the training set using scikit-learn.
- Prediction: Predict outcomes on the test set.
- Compute Metrics:
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score
# Example values
y_true = [0, 1, 1, 0, 1]
y_pred = [0, 1, 0, 0, 1]
conf_matrix = confusion_matrix(y_true, y_pred)
accuracy = accuracy_score(y_true, y_pred)
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)
These metrics will give insights into model performance and help compare with other classifiers.
- Train/Test Split: Divide the dataset into training (e.g., 80%) and test (20%) sets.
- Model Training: Fit the KNeighborsClassifier on the training set using scikit-learn.
- Prediction: Predict outcomes on the test set.
- Compute Metrics:
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score # Example values y_true = [0, 1, 1, 0, 1] y_pred = [0, 1, 0, 0, 1] conf_matrix = confusion_matrix(y_true, y_pred) accuracy = accuracy_score(y_true, y_pred) precision = precision_score(y_true, y_pred) recall = recall_score(y_true, y_pred)
Programming Code: Following code write in: ML_P05.py # ML Project Program 05
# K-Nearest Neighbors Algorithm on diabetes.csv dataset
import pandas as pd
import numpy as np
data = pd.read_csv("./diabetes_dataset/diabetes.csv")
data
data.info()
data.describe()
data.columns
# Checking null values
data.isnull().sum()
# create variables
data_x = data.drop(columns = "Outcome", axis=1)
data_y = data['Outcome']
data.shape
data_x.shape , data_y.shape
from sklearn.preprocessing import StandardScaler
scale = StandardScaler()
scaledX = scale.fit_transform(data_x)
# split into Train & Test
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(scaledX, data_y, test_size = 0.2,)
# Machine Learning Model - KNN
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors = 7)
knn.fit(x_train, y_train)
y_pred = knn.predict(x_test)
from sklearn import metrics
# Confusion Matrix
cs = metrics.confusion_matrix(y_test, y_pred)
print("Confusion Matrix is : \n", cs)
# Accuracy score
ac = metrics.accuracy_score(y_test, y_pred)
print("Accuracy score is : ", ac) # Model Accuracy is 69%
# Error Rate
er = 1 - ac
print("Error rate is : ", er) # Error Rate is : 0.305
# Precision
p = metrics.precision_score(y_test, y_pred)
print("Precision: ", p)
# Recall
r = metrics.recall_score(y_test, y_pred)
print("Recall: ", r)
# Precision score is: 0.607 &
# Recall score is: 0.534
# Thanks for Watching
# Thanks For Reading.
Output:
# ML Project Program 05 # K-Nearest Neighbors Algorithm on diabetes.csv dataset import pandas as pd import numpy as np data = pd.read_csv("./diabetes_dataset/diabetes.csv") data data.info() data.describe() data.columns # Checking null values data.isnull().sum() # create variables data_x = data.drop(columns = "Outcome", axis=1) data_y = data['Outcome'] data.shape data_x.shape , data_y.shape from sklearn.preprocessing import StandardScaler scale = StandardScaler() scaledX = scale.fit_transform(data_x) # split into Train & Test from sklearn.model_selection import train_test_split x_train, x_test, y_train, y_test = train_test_split(scaledX, data_y, test_size = 0.2,) # Machine Learning Model - KNN from sklearn.neighbors import KNeighborsClassifier knn = KNeighborsClassifier(n_neighbors = 7) knn.fit(x_train, y_train) y_pred = knn.predict(x_test) from sklearn import metrics # Confusion Matrix cs = metrics.confusion_matrix(y_test, y_pred) print("Confusion Matrix is : \n", cs) # Accuracy score ac = metrics.accuracy_score(y_test, y_pred) print("Accuracy score is : ", ac) # Model Accuracy is 69% # Error Rate er = 1 - ac print("Error rate is : ", er) # Error Rate is : 0.305 # Precision p = metrics.precision_score(y_test, y_pred) print("Precision: ", p) # Recall r = metrics.recall_score(y_test, y_pred) print("Recall: ", r) # Precision score is: 0.607 & # Recall score is: 0.534 # Thanks for Watching # Thanks For Reading.
Output:
0 Comments