PCA Dim Reduction
Core Concept scikit-learn

Principal Component Analysis (PCA)

Learn how PCA compresses high-dimensional data into a few principal components while preserving most of the variance.

What is PCA?

PCA finds new axes (principal components) that capture maximum variance in the data. We can drop components with small variance to reduce dimensionality.

  • Unsupervised linear transformation.
  • Components are orthogonal (uncorrelated).
  • Often used before visualization or modeling.

Example: PCA on Iris Dataset

Reduce 4D Iris to 2D
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

iris = load_iris()
X = iris.data
y = iris.target
target_names = iris.target_names

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

print("Explained variance ratio:", pca.explained_variance_ratio_)

plt.figure(figsize=(8, 6))
colors = ["navy", "turquoise", "darkorange"]

for color, i, target_name in zip(colors, [0, 1, 2], target_names):
    plt.scatter(
        X_pca[y == i, 0],
        X_pca[y == i, 1],
        color=color,
        alpha=0.7,
        label=target_name
    )

plt.xlabel("PC 1")
plt.ylabel("PC 2")
plt.title("PCA of Iris Dataset")
plt.legend()
plt.show()