Hyperparameter Tuning

Learn how to systematically search for good hyperparameters using grid search and random search with cross-validation.

What are Hyperparameters?

Hyperparameters are settings you choose before training (e.g., number of trees, learning rate, regularization strength). They are not learned from data.

Grid Search

Grid search tries all combinations from a grid of hyperparameter values and selects the best according to a chosen metric.

GridSearchCV with RandomForestClassifier

from sklearn.datasets import load_iris
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.ensemble import RandomForestClassifier

iris = load_iris()
X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

rf = RandomForestClassifier(random_state=42)

param_grid = {
    "n_estimators": [50, 100, 200],
    "max_depth": [None, 5, 10],
    "min_samples_split": [2, 5]
}

grid_search = GridSearchCV(
    rf,
    param_grid,
    cv=5,
    scoring="accuracy",
    n_jobs=-1
)

grid_search.fit(X_train, y_train)

print("Best params:", grid_search.best_params_)
print("Best CV score:", grid_search.best_score_)
print("Test accuracy:", grid_search.score(X_test, y_test))

Random Search

Random search samples random combinations from a specified distribution. It is often more efficient when there are many hyperparameters.

RandomizedSearchCV Example

from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from scipy.stats import randint

rf = RandomForestClassifier(random_state=42)

param_dist = {
    "n_estimators": randint(50, 200),
    "max_depth": [None, 5, 10, 20],
    "min_samples_split": randint(2, 10)
}

random_search = RandomizedSearchCV(
    rf,
    param_dist,
    n_iter=10,
    cv=5,
    scoring="accuracy",
    random_state=42,
    n_jobs=-1
)

random_search.fit(X_train, y_train)

print("Best params:", random_search.best_params_)
print("Best CV score:", random_search.best_score_)
print("Test accuracy:", random_search.score(X_test, y_test))

Next: Time Series Basics

Related Data Science Links

Hyperparameter Tuning

What are Hyperparameters?

Grid Search

Random Search

RandomizedSearchCV Example