Part 12: Feature importances
Part 12: Feature importances¶
Before moving to the severity model, let us look at what the frequency model is relying on. Create a new cell:
importances = freq_model.get_feature_importance(type="FeatureImportance")
imp_df = (
pl.DataFrame({"feature": FEATURES, "importance": importances.tolist()})
.sort("importance", descending=True)
)
print(imp_df)
Run this. You will see something like:
shape: (5, 2)
feature importance
ncd_years 35.2
driver_age 28.7
vehicle_group 19.4
area 10.8
conviction_points 5.9
The exact values vary, but NCD years and driver age are usually dominant for UK motor frequency.
What this metric measures: CatBoost's default importance is PredictionValuesChange - the average change in the prediction when a feature's value is varied across the training data, normalised to sum to 100. It is a portfolio-level summary. It tells you which features the model is using overall, but not how any individual feature affects a specific prediction or what direction the effect is. Module 4 covers SHAP values for that purpose.
For the pricing committee presentation, feature importances answer "what is the model using?" The double lift chart (Part 14) answers "is it finding real risk differentiation?" You need both.