Part 9: The cross-validation loop
Part 9: The cross-validation loop¶
Now we put it together. The CV loop trains a fresh CatBoost model on each fold and evaluates it on the held-out year. We track the deviance across folds to get a mean and variance.
Create a new cell:
cv_deviances = []
for fold_idx, (train_idx, val_idx) in enumerate(folds):
df_train = df_pd.iloc[train_idx]
df_val = df_pd.iloc[val_idx]
X_train = df_train[FEATURES]
y_train = df_train[FREQ_TARGET].values
w_train = df_train[EXPOSURE_COL].values
X_val = df_val[FEATURES]
y_val = df_val[FREQ_TARGET].values
w_val = df_val[EXPOSURE_COL].values
train_pool = Pool(X_train, y_train, baseline=np.log(w_train), cat_features=CAT_FEATURES)
val_pool = Pool(X_val, y_val, baseline=np.log(w_val), cat_features=CAT_FEATURES)
model = CatBoostRegressor(
iterations=500,
learning_rate=0.05,
depth=6,
loss_function="Poisson",
eval_metric="Poisson",
random_seed=42,
verbose=0, # 0 means no training progress output per fold
)
model.fit(train_pool, eval_set=val_pool)
y_pred = model.predict(val_pool)
fold_dev = poisson_deviance(y_val, y_pred, w_val)
cv_deviances.append(fold_dev)
val_year = sorted(df_pd.iloc[val_idx]["accident_year"].unique().tolist())
print(f"Fold {fold_idx+1} (validate {val_year}): Poisson deviance = {fold_dev:.4f}")
print(f"\nMean CV deviance: {np.mean(cv_deviances):.4f}")
print(f"Std CV deviance: {np.std(cv_deviances):.4f}")
Run this cell. It trains three CatBoost models. On Free Edition this takes 2-4 minutes. The output should look like:
Fold 1 (validate [2022]): Poisson deviance = 0.1923
Fold 2 (validate [2023]): Poisson deviance = 0.1887
Fold 3 (validate [2024]): Poisson deviance = 0.1841
The exact numbers will vary slightly. What you are looking for is that the three folds produce similar deviances - if one fold is substantially worse than the others, it may indicate a data quality issue in that year or a genuine model instability.
A note on what the standard deviation means: Three folds gives a crude estimate of variance, not a proper confidence interval. Do not report a confidence interval computed from three folds as if it is statistically rigorous. What you can say is: "CV deviance across three temporal folds ranged from X to Y, with a mean of Z."
What verbose=0 does: By default, CatBoost prints training progress on every iteration. On 500 iterations across three folds, that is 1,500 lines of output. Setting verbose=0 suppresses this. If you want to see progress (useful when debugging), change it to verbose=100 to print every 100 iterations.