Part 1: The workflow we are building
Part 1: The workflow we are building¶
We are building a motor frequency-severity model: a Poisson GLM for claim frequency and a Gamma GLM for average severity, both with log link and exposure offset (a correction that adjusts for each policy's earned duration — explained fully in Part 4). The pure premium estimate is the product: frequency times severity.
The data pipeline, in order:
- Generate a synthetic UK motor dataset with known true parameters
- Prepare features: encode categorical factors, handle base levels, check for data quality issues
- Fit the frequency GLM (Poisson with log link and exposure offset)
- Fit the severity GLM (Gamma with log link, on claimed policies only)
- Run diagnostics: deviance residuals, actual-versus-expected by factor level
- Validate against known true parameters (and later, against Emblem output)
- Export factor tables in the format Radar expects
We use synthetic data throughout because it has known true parameters. This lets us verify that our GLM is recovering the right answers - if the model works correctly on synthetic data, we can trust it on real data.