Skip to content

Part 13: Rebuilding the GLM with interactions

Part 13: Rebuilding the GLM with interactions

With the suggested interaction pairs identified, we refit the GLM jointly with all approved interactions.

The difference between test_interactions and build_glm_with_interactions

test_interactions (which ran inside detector.fit()) tests each pair in isolation: for each candidate, it fits a GLM with just that interaction added and reports the deviance gain. This is correct for deciding which interactions to add.

build_glm_with_interactions fits one GLM with all approved interactions simultaneously. The joint deviance gain is typically smaller than the sum of the individual gains, because the interactions overlap: age × vehicle group and age × annual mileage share the age factor, so some of the gain from the second interaction was already captured by the first.

from insurance_interactions import build_glm_with_interactions

# Refit the GLM with the recommended interaction pairs
enhanced_glm, comparison = build_glm_with_interactions(
    X=X,
    y=y,
    exposure=exposure_arr,
    interaction_pairs=suggested,
    family="poisson",
)

print("Model comparison:")
print(comparison)

Expected output:

shape: (2, 8)
┌───────────────────────────┬─────────────┬──────────┬─────────────┬─────────────┬─────────────────┬───────────────────┬──────────────┐
 model                      deviance     n_params  aic          bic          delta_deviance   delta_deviance_pct n_new_params 
 ---                        ---          ---       ---          ---          ---              ---                ---          
 str                        f64          i64       f64          f64          f64              f64                i64          
╞═══════════════════════════╪═════════════╪══════════╪═════════════╪═════════════╪═════════════════╪═══════════════════╪══════════════╡
 base_glm                   98432.x      19        98470.x      98627.x      0.0              0.0                0            
 glm_with_interactions      96850.x      43        96936.x      97135.x      1582.x           1.61               24           
└───────────────────────────┴─────────────┴──────────┴─────────────┴─────────────┴─────────────────┴───────────────────┴──────────────┘

The exact numbers will vary slightly depending on the random seed and CANN training, but you should see: - delta_deviance of several hundred to a few thousand (capturing the planted interactions) - n_new_params reflecting the parameter cost of your suggested interactions - AIC and BIC both lower for the interaction model (negative delta is better)

Inspect the interaction GLM coefficients

# The enhanced_glm is a fitted glum GeneralizedLinearRegressor
# We can inspect its coefficients
print(f"Base GLM parameters:      {len(glm_base.coef_) + 1}")
print(f"Enhanced GLM parameters:  {len(enhanced_glm.coef_) + 1}")
print()

# Show interaction coefficients (columns starting with _ix_)
coef_names = enhanced_glm.feature_names_in_
ix_cols    = [c for c in coef_names if c.startswith("_ix_")]
ix_coefs   = [enhanced_glm.coef_[list(coef_names).index(c)] for c in ix_cols]

print("Interaction term coefficients:")
for name, coef in sorted(zip(ix_cols, ix_coefs), key=lambda x: abs(x[1]), reverse=True):
    print(f"  {name:<40} {coef:+.4f}  (relativity: {np.exp(coef):.3f})")

What this shows: The interaction terms are named _ix_age_band_vehicle_group (for categorical × categorical, the library creates a combined categorical column). The coefficients tell you the additional log-multiplicative adjustment for each combination of age band and vehicle group levels, on top of the main effects.

For the planted interaction (age 17-21, vehicle group 41-50), you should see positive interaction coefficients in the region of +0.25 to +0.35, consistent with the planted 0.30 log-unit bump.