Ad Code

Responsive Advertisement

Machine Learning Program / Project - 05

Question 05: Implement K-Nearest Neighbors algorithm on diabetes.csv dataset. Compute confusion matrix, accuracy, error rate, precision and recall on the given dataset.

Download hole Program / Project code, by clicking following link:
How can the K-Nearest Neighbors (KNN) algorithm be used to classify diabetes data, and what performance metrics should be calculated ?
To apply it on the diabetes.csv dataset:
  1. Preprocess the data handle missing values, normalize features.
  2. Split the data into training and test sets.
  3. Fit the KNN model using training data.
  4. Predict on test data using the model.
  5. Evaluate the model using:
    • Confusion Matrix to count TP, FP, FN, TN.
    • Accuracy = (TP + TN) / (TP + FP + FN + TN)
    • Error Rate = 1 - Accuracy
    • Precision = TP / (TP + FP)
    • Recall = TP / (TP + FN)
These metrics help determine how well the model classifies diabetes-positive and -negative cases.

What are the steps to compute confusion matrix, accuracy, precision, and recall for a KNN model applied on the diabetes dataset ?
  1. Train/Test Split: Divide the dataset into training (e.g., 80%) and test (20%) sets.
  2. Model Training: Fit the KNeighborsClassifier on the training set using scikit-learn.
  3. Prediction: Predict outcomes on the test set.
  4. Compute Metrics:
    from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score
    
    # Example values
    y_true = [0, 1, 1, 0, 1]
    y_pred = [0, 1, 0, 0, 1]
    
    conf_matrix = confusion_matrix(y_true, y_pred)
    accuracy = accuracy_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred)
    recall = recall_score(y_true, y_pred)
These metrics will give insights into model performance and help compare with other classifiers.

Programming Code:
Following code write in: ML_P05.py
# ML Project Program 05 

# K-Nearest Neighbors Algorithm on diabetes.csv dataset
import pandas as pd
import numpy as np

data = pd.read_csv("./diabetes_dataset/diabetes.csv")
data
data.info()
data.describe()
data.columns
# Checking null values

data.isnull().sum()
# create variables
data_x = data.drop(columns = "Outcome", axis=1)
data_y = data['Outcome']
data.shape
data_x.shape , data_y.shape
from sklearn.preprocessing import StandardScaler
scale = StandardScaler()
scaledX = scale.fit_transform(data_x)

# split into Train & Test 
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(scaledX, data_y, test_size = 0.2,)
# Machine Learning Model - KNN
from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors = 7)

knn.fit(x_train, y_train)
y_pred = knn.predict(x_test)
from sklearn import metrics

# Confusion Matrix

cs = metrics.confusion_matrix(y_test, y_pred)

print("Confusion Matrix is : \n", cs)
# Accuracy score

ac = metrics.accuracy_score(y_test, y_pred)

print("Accuracy score is : ", ac)                # Model Accuracy is 69%
# Error Rate

er = 1 - ac

print("Error rate is : ", er)           # Error Rate is : 0.305
# Precision

p = metrics.precision_score(y_test, y_pred)

print("Precision: ", p)
#  Recall

r = metrics.recall_score(y_test, y_pred)

print("Recall: ", r)
# Precision score is: 0.607            &
# Recall score is: 0.534
# Thanks for Watching

# Thanks For Reading.
Output:

Post a Comment

0 Comments

Ad Code

Responsive Advertisement