K-Nearest Neighbors (KNN)

Learn how KNN classifies a new point based on the labels of its nearest neighbors, with a short theory and Python code example.

What is KNN?

K-Nearest Neighbors is a lazy learning algorithm: it stores the training data and makes predictions only when asked.

For a new point, it finds the K closest training points (neighbors).
For classification, it takes a majority vote of their labels.
Distance is usually Euclidean distance for numeric features.

Example: KNeighborsClassifier

KNN on Iris Dataset

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report

iris = load_iris()
X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

# Scale features for distance-based algorithms
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

knn = KNeighborsClassifier(
    n_neighbors=5,    # K value
    metric="minkowski",
    p=2               # p=2 => Euclidean distance
)

knn.fit(X_train_scaled, y_train)
y_pred = knn.predict(X_test_scaled)

print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nReport:\n", classification_report(y_test, y_pred, target_names=iris.target_names))

Back to Data Science Tutorial