Related Data Science Links
Learn Hyperparameter Tuning Data Science Tutorial, validate concepts with Hyperparameter Tuning Data Science MCQ Questions, and prepare interviews through Hyperparameter Tuning Data Science Interview Questions and Answers.
Hyperparameter Tuning
Learn how to systematically search for good hyperparameters using grid search and random search with cross-validation.
What are Hyperparameters?
Hyperparameters are settings you choose before training (e.g., number of trees, learning rate, regularization strength). They are not learned from data.
Grid Search
Grid search tries all combinations from a grid of hyperparameter values and selects the best according to a chosen metric.
from sklearn.datasets import load_iris
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.ensemble import RandomForestClassifier
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
rf = RandomForestClassifier(random_state=42)
param_grid = {
"n_estimators": [50, 100, 200],
"max_depth": [None, 5, 10],
"min_samples_split": [2, 5]
}
grid_search = GridSearchCV(
rf,
param_grid,
cv=5,
scoring="accuracy",
n_jobs=-1
)
grid_search.fit(X_train, y_train)
print("Best params:", grid_search.best_params_)
print("Best CV score:", grid_search.best_score_)
print("Test accuracy:", grid_search.score(X_test, y_test))
Random Search
Random search samples random combinations from a specified distribution. It is often more efficient when there are many hyperparameters.
RandomizedSearchCV Example
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestClassifier
from scipy.stats import randint
rf = RandomForestClassifier(random_state=42)
param_dist = {
"n_estimators": randint(50, 200),
"max_depth": [None, 5, 10, 20],
"min_samples_split": randint(2, 10)
}
random_search = RandomizedSearchCV(
rf,
param_dist,
n_iter=10,
cv=5,
scoring="accuracy",
random_state=42,
n_jobs=-1
)
random_search.fit(X_train, y_train)
print("Best params:", random_search.best_params_)
print("Best CV score:", random_search.best_score_)
print("Test accuracy:", random_search.score(X_test, y_test))