KNN
Classification
Simple & Intuitive
scikit-learn
K-Nearest Neighbors (KNN)
Learn how KNN classifies a new point based on the labels of its nearest neighbors, with a short theory and Python code example.
What is KNN?
K-Nearest Neighbors is a lazy learning algorithm: it stores the training data and makes predictions only when asked.
- For a new point, it finds the K closest training points (neighbors).
- For classification, it takes a majority vote of their labels.
- Distance is usually Euclidean distance for numeric features.
Example: KNeighborsClassifier
KNN on Iris Dataset
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
# Scale features for distance-based algorithms
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
knn = KNeighborsClassifier(
n_neighbors=5, # K value
metric="minkowski",
p=2 # p=2 => Euclidean distance
)
knn.fit(X_train_scaled, y_train)
y_pred = knn.predict(X_test_scaled)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nReport:\n", classification_report(y_test, y_pred, target_names=iris.target_names))