Related Data Science Links
Learn Bias Variance Data Science Tutorial, validate concepts with Bias Variance Data Science MCQ Questions, and prepare interviews through Bias Variance Data Science Interview Questions and Answers.
Bias–Variance
Generalization
Conceptual
Python Example
Bias–Variance Tradeoff
Learn how model complexity affects bias, variance, overfitting, and underfitting, and see a simple visualization using polynomial regression in Python.
What are Bias and Variance?
- Bias: error from using an overly simple model that cannot capture the true pattern (underfitting).
- Variance: error from a model that is too sensitive to small changes in the training data (overfitting).
Goal: Find a model that balances low bias and low variance for best performance on unseen data.
Underfitting vs Overfitting
Underfitting (High Bias)
- Model is too simple (e.g., straight line for complex curve).
- High error on both train and test sets.
- Solution: increase model complexity, add features.
Overfitting (High Variance)
- Model is too complex and memorizes noise.
- Very low train error but high test error.
- Solution: regularization, simpler model, more data.
Example: Polynomial Regression Degrees
We will fit polynomial models of different degrees to noisy data and compare train and test errors to see underfitting and overfitting.
Bias–Variance with Polynomial Degree
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
# Generate synthetic data from a non-linear function
np.random.seed(42)
X = np.linspace(0, 1, 50).reshape(-1, 1)
y_true = np.sin(2 * np.pi * X).ravel()
y = y_true + np.random.normal(scale=0.2, size=len(y_true)) # add noise
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42
)
degrees = [1, 3, 10] # simple, medium, very complex
for degree in degrees:
# Create polynomial features
poly = PolynomialFeatures(degree=degree, include_bias=False)
X_train_poly = poly.fit_transform(X_train)
X_test_poly = poly.transform(X_test)
# Fit linear regression on polynomial features
model = LinearRegression()
model.fit(X_train_poly, y_train)
# Predictions
y_train_pred = model.predict(X_train_poly)
y_test_pred = model.predict(X_test_poly)
# Errors
train_mse = mean_squared_error(y_train, y_train_pred)
test_mse = mean_squared_error(y_test, y_test_pred)
print(f"Degree {degree}: Train MSE={train_mse:.3f}, Test MSE={test_mse:.3f}")