Bias–Variance Tradeoff

Learn how model complexity affects bias, variance, overfitting, and underfitting, and see a simple visualization using polynomial regression in Python.

What are Bias and Variance?

Bias: error from using an overly simple model that cannot capture the true pattern (underfitting).
Variance: error from a model that is too sensitive to small changes in the training data (overfitting).

                Goal: Find a model that balances low bias and low variance for best performance on unseen data.
            

Underfitting vs Overfitting

Underfitting (High Bias)

Model is too simple (e.g., straight line for complex curve).
High error on both train and test sets.
Solution: increase model complexity, add features.

Overfitting (High Variance)

Model is too complex and memorizes noise.
Very low train error but high test error.
Solution: regularization, simpler model, more data.

Example: Polynomial Regression Degrees

We will fit polynomial models of different degrees to noisy data and compare train and test errors to see underfitting and overfitting.

Bias–Variance with Polynomial Degree

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

# Generate synthetic data from a non-linear function
np.random.seed(42)
X = np.linspace(0, 1, 50).reshape(-1, 1)
y_true = np.sin(2 * np.pi * X).ravel()
y = y_true + np.random.normal(scale=0.2, size=len(y_true))  # add noise

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

degrees = [1, 3, 10]  # simple, medium, very complex

for degree in degrees:
    # Create polynomial features
    poly = PolynomialFeatures(degree=degree, include_bias=False)
    X_train_poly = poly.fit_transform(X_train)
    X_test_poly = poly.transform(X_test)

    # Fit linear regression on polynomial features
    model = LinearRegression()
    model.fit(X_train_poly, y_train)

    # Predictions
    y_train_pred = model.predict(X_train_poly)
    y_test_pred = model.predict(X_test_poly)

    # Errors
    train_mse = mean_squared_error(y_train, y_train_pred)
    test_mse = mean_squared_error(y_test, y_test_pred)

    print(f"Degree {degree}: Train MSE={train_mse:.3f}, Test MSE={test_mse:.3f}")

Next: Feature Engineering

Related Data Science Links