Skip to content

Part 10: Hyperparameter tuning with Optuna

Part 10: Hyperparameter tuning with Optuna

The default parameters we used in Part 9 (depth=6, learning_rate=0.05, iterations=500) are reasonable starting points but not optimised. Optuna searches the parameter space to find better values.

We tune on the last fold only - train on 2019-2022, validate on 2023. Tuning on all folds is more rigorous but multiplies compute time by the number of folds. For a 100,000-policy book, 40 trials on a single fold takes 15-20 minutes on a standard cluster.

First, extract the last fold data. Create a new cell:

# Use the last fold for tuning: train 2019-2022, validate 2023
# (fold index 2, since folds is 0-indexed)
train_idx_t, val_idx_t = folds[-1]

df_train_t = df_pd.iloc[train_idx_t]
df_val_t   = df_pd.iloc[val_idx_t]

X_train_t = df_train_t[FEATURES]
y_train_t = df_train_t[FREQ_TARGET].values
w_train_t = df_train_t[EXPOSURE_COL].values

X_val_t = df_val_t[FEATURES]
y_val_t = df_val_t[FREQ_TARGET].values
w_val_t = df_val_t[EXPOSURE_COL].values

# Build Pool objects ONCE outside the objective function.
# CatBoost re-encodes categoricals at Pool construction time.
# If you construct inside the objective, this encoding work happens
# on every trial - wasted effort across 40 trials.
train_pool_t = Pool(X_train_t, y_train_t, baseline=np.log(w_train_t), cat_features=CAT_FEATURES)
val_pool_t   = Pool(X_val_t,   y_val_t,   baseline=np.log(w_val_t),   cat_features=CAT_FEATURES)

print(f"Tuning on: {sorted(df_train_t['accident_year'].unique().tolist())} -> validate {sorted(df_val_t['accident_year'].unique().tolist())}")

Now define the Optuna objective function. Create a new cell:

optuna.logging.set_verbosity(optuna.logging.WARNING)  # suppress Optuna's own verbose output

def objective(trial: optuna.Trial) -> float:
    params = {
        "iterations":    trial.suggest_int("iterations", 200, 1000),
        "depth":         trial.suggest_int("depth", 4, 7),
        "learning_rate": trial.suggest_float("learning_rate", 0.02, 0.15, log=True),
        "l2_leaf_reg":   trial.suggest_float("l2_leaf_reg", 1.0, 10.0),
        "loss_function": "Poisson",
        "eval_metric":   "Poisson",
        "random_seed":   42,
        "verbose":       0,
    }
    model = CatBoostRegressor(**params)
    model.fit(train_pool_t, eval_set=val_pool_t)
    pred = model.predict(val_pool_t)
    return poisson_deviance(y_val_t, pred, w_val_t)

What each parameter does:

  • depth: the maximum depth of each tree. Controls how many features can interact. Depth 4 means at most 4-way interactions. For motor data with 5-8 features, depth 4-6 is usually optimal. Depth 7+ overfits without improving validation deviance.
  • learning_rate: how large a step each tree takes. Lower rates require more iterations but generalise better. We search on a log scale (0.02 to 0.15) because the effect is multiplicative.
  • l2_leaf_reg: L2 regularisation on leaf values. Increase this if training deviance is much lower than validation deviance - it is the standard overfitting signal.
  • iterations: the number of trees. Interacts with learning_rate - a low rate needs more iterations to converge. Optuna handles this by exploring the joint space.

Now run the study. Create a new cell:

study = optuna.create_study(direction="minimize")
study.optimize(objective, n_trials=40, show_progress_bar=True)

best_params = study.best_params
print(f"\nBest Poisson deviance (40 trials): {study.best_value:.5f}")
print("\nBest parameters:")
for k, v in best_params.items():
    print(f"  {k}: {v}")

Run this. Optuna runs 40 trials. On Free Edition this takes 10-20 minutes. The progress bar shows completed trials. Let it run.

What Optuna is doing internally: The first 10-15 trials are essentially random exploration. After that, Optuna uses a Tree-structured Parzen Estimator (TPE) to concentrate subsequent trials on the most promising regions of the parameter space. The marginal improvement from trials 30-40 is typically small compared to trials 1-20 - we use 40 to be thorough.

After it finishes, run this in the next cell to see which parameters drove most of the variation across trials:

importances = optuna.importance.get_param_importances(study)
print("Parameter importances (what drove trial-to-trial variation):")
for param, imp in sorted(importances.items(), key=lambda x: x[1], reverse=True):
    print(f"  {param}: {imp:.3f}")

On UK motor data with 5-8 features, typical results: depth accounts for 40-55% of trial variance, learning_rate 25-35%, l2_leaf_reg 10-20%, iterations 5-15%. This tells you that depth is the parameter worth tuning most carefully. If compute time is limited, fixing depth=5 and tuning only learning_rate and iterations in 20 trials will get you within 0.001-0.002 deviance of the full search.