Related Data Science Links
Learn Kmeans Data Science Tutorial, validate concepts with Kmeans Data Science MCQ Questions, and prepare interviews through Kmeans Data Science Interview Questions and Answers.
K-Means
Clustering
Unsupervised
scikit-learn
K-Means Clustering
Learn how K-Means groups similar data points into clusters and how to implement it in Python with a simple example.
What is K-Means?
K-Means is an unsupervised learning algorithm that partitions data into K clusters. Each cluster is represented by a centroid (mean of points in that cluster).
- Choose number of clusters K.
- Initialize K centroids.
- Assign each point to nearest centroid.
- Recompute centroids as mean of assigned points.
- Repeat steps 3–4 until assignments stop changing.
Example: Clustering Synthetic Data
KMeans with Elbow Method
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
# Generate sample data
X, y_true = make_blobs(
n_samples=300,
centers=4,
cluster_std=0.60,
random_state=0
)
# Elbow method to choose K
inertias = []
K_range = range(1, 10)
for k in K_range:
kmeans = KMeans(n_clusters=k, random_state=42)
kmeans.fit(X)
inertias.append(kmeans.inertia_) # sum of squared distances to centroids
plt.figure(figsize=(8, 4))
plt.plot(K_range, inertias, "bo-")
plt.xlabel("Number of clusters (K)")
plt.ylabel("Inertia")
plt.title("Elbow Method for Optimal K")
plt.grid(True, alpha=0.3)
plt.show()
# Fit final K-Means with chosen K (e.g., 4)
kmeans = KMeans(n_clusters=4, random_state=42)
y_kmeans = kmeans.fit_predict(X)
plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=y_kmeans, cmap="viridis", alpha=0.7)
plt.scatter(
kmeans.cluster_centers_[:, 0],
kmeans.cluster_centers_[:, 1],
s=200,
c="red",
marker="X",
label="Centroids"
)
plt.title("K-Means Clustering Results")
plt.legend()
plt.show()