Project - 06 (github.com)

How can K-Means Clustering be applied to the `sales_data_sample.csv` dataset, and how is the Elbow Method used to determine the optimal number of clusters ?

To apply K-Means Clustering on sales_data_sample.csv:

Preprocess the data: Select relevant numerical features (e.g., SALES, QUANTITYORDERED), handle missing values, and scale the features using StandardScaler.

Apply the Elbow Method:

Run K-Means for a range of k (e.g., 1 to 10).

For each k, calculate inertia (sum of squared distances to the closest cluster center).

Plot k vs inertia.

The elbow point in the curve is where the rate of decrease sharply changes. This is the optimal k.

This approach helps segment the data into meaningful clusters such as sales performance tiers or customer groupings.

Write a short Python snippet to implement the Elbow Method for determining the number of clusters in the sales dataset using K-Means ?

from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
import pandas as pd

# Load and preprocess data
data = pd.read_csv('sales_data_sample.csv')
X = data[['SALES', 'QUANTITYORDERED']].dropna()
X_scaled = StandardScaler().fit_transform(X)

# Elbow method
inertia = []
K_range = range(1, 11)
for k in K_range:
    kmeans = KMeans(n_clusters=k, random_state=42)
    kmeans.fit(X_scaled)
    inertia.append(kmeans.inertia_)

# Plot
plt.plot(K_range, inertia, 'bo-')
plt.xlabel('Number of clusters (k)')
plt.ylabel('Inertia')
plt.title('Elbow Method for Optimal k')
plt.show()

This code helps visualize the elbow point for optimal k selection, which can then be used to apply final clustering.

Programming Code:

Following code write in: ML_P06.py

# ML Project Program 06 

# K-Means clustering/ hierarchical clustering on sales_data_sample.csv dataset

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.read_csv("./salesdata_sample_dataset/sales_data_sample.csv", encoding = 'latin1')
data
data.info()
data.describe()
data.columns
data.shape
data = data[['QUANTITYORDERED', 'ORDERLINENUMBER']]
new_data = data.dropna(axis = 0)
from sklearn.cluster import KMeans
import seaborn as sns

wcss = []

for i in range (1, 11):
    clustering = KMeans(n_clusters = i, init = 'k-means++', random_state=42 )
    clustering.fit(data)
    wcss.append(clustering.inertia_)
    
ks = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
sns.lineplot(x = ks, y = wcss)
fig, axes = plt.subplots(nrows = 1, ncols = 2, figsize = (15, 5))
sns.scatterplot(ax = axes[0], data = new_data, x = 'QUANTITYORDERED', y = 'ORDERLINENUMBER').set_title('without clustering')
sns.scatterplot(ax = axes[1], data = new_data, x = 'QUANTITYORDERED', y = 'ORDERLINENUMBER', hue = clustering.labels_).set_title('Using Elbow Clustering Method')

new_data.describe().T
from sklearn.preprocessing import StandardScaler
ss = StandardScaler()
scaled = ss.fit_transform(new_data,)
wcss_sc = []
for i in range(1, 11):
    clustering_sc = KMeans(n_clusters = i, init = 'k-means++', random_state = 42)
    clustering_sc.fit(scaled)
    wcss_sc.append(clustering_sc.inertia_)
    
ks = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
sns.lineplot(x = ks, y = wcss_sc)

fig, axes = plt.subplots(nrows = 1, ncols = 3, figsize = (15, 5))
sns.scatterplot(ax = axes[0], data = new_data, x = 'QUANTITYORDERED', y = 'ORDERLINENUMBER').set_title('without clustering')
sns.scatterplot(ax = axes[2], data = new_data, x = 'QUANTITYORDERED', y = 'ORDERLINENUMBER', hue = clustering.labels_).set_title('Using Elbow Clustering Method')
sns.scatterplot(ax = axes[1], data = new_data, x = 'QUANTITYORDERED', y = 'ORDERLINENUMBER', hue = clustering_sc.labels_).set_title('Using Elbow Clustering Method & Scaled Data')

# Thanks For Reading.

Output:

Machine Learning Program / Project - 06

Posted by go2collage

Post a Comment

0 Comments

Search This Blog

Most Popular

C# Error 02: The process cannot access the file because it is being used by another process in C#

Flutter Error 02: Failed to load FirebaseOptions from resource Check that you have defined ...

Console App Task 23: How to print a table name along with its contents using Console.WriteLine in C#

Featured Post

Machine Learning Program / Project - 08

Program / Project Code

Pages

Footer Menu Widget

Contact form

Ad Code