Technical articles on UK insurance pricing: GBMs, SHAP relativities, cross-validation, credibility theory, conformal prediction, spatial ratemaking, and FCA compliance.
Standard renewal demand models overestimate price sensitivity for bad risks and underestimate it for good ones — because risk drives both premium and lapse. ...
Why standard churn models fail for UK personal lines, and what to do instead. Introducing insurance-survival: covariate-adjusted cure models, survival-adjust...
Every insurer runs discount campaigns at renewal. Most target by propensity to lapse — who will leave? The correct question is who will respond to a discount...
UK motor insurers charge under-25s roughly three times the premium of 25-30 drivers. The risk discontinuity at age 25 is assumed, not demonstrated. insurance...
NCD is a crude proxy for individual risk. insurance-experience implements actuarially correct posterior experience rating at policy level: static Bühlmann-St...
R has had the dglm package since 1998. Python has had nothing. insurance-dispersion is a Double GLM implementation that gives every policy its own dispersion...
Every SIU referral threshold in UK motor insurance is arbitrary — 'refer the top 5%' or 'anything above 0.7'. Conformal p-values fix this by giving each clai...
Policyholders game rating thresholds. Mileage declarations cluster just below 10,000. Claimed ages spike just above 25. Declared sums bunch at £100k, £200k, ...
The CLS token's self-attention weight in the Credibility Transformer is not an analogy for Bühlmann-Straub credibility — it is mathematically identical to it...
insurance-conformal v0.2 adds SCRReport for distribution-free 99.5% upper bounds, LocallyWeightedConformal for intervals ~24% narrower than standard split co...
FCA EP25/2 admitted that individual-level ENBP counterfactual analysis 'is unable to be conducted'. Lei & Candès 2021 weighted conformal inference says other...
Most UK insurers receive a black-box telematics score from vendors like Octo or The Floow. They cannot inspect it, tune it, or connect it to their rating met...
SHAP explains individual predictions. Shapley effects decompose portfolio-level variance. insurance-sensitivity brings Owen 2014, Song 2016 and Rabitti-Tzoug...
Factor level collapsing is the highest-labour, lowest-reproducibility activity in GLM development. insurance-glm-cluster automates it using the R2VF algorith...
insurance-fairness-ot implements discrimination-free insurance pricing via Lindholm (2022) marginalisation, Côté-Genest-Abdallah (2025) causal path decomposi...
insurance-fairness-diag implements D_proxy (LRTW 2026, SSRN 4897265), Owen (2014) Shapley attribution, and Côté-Charpentier (2025) local vulnerability scores...
Your GLM predicts a mean. DRN refines it into a full predictive distribution — per risk, per policy, with parametric tails for the extremes that matter for c...
GLMs model the mean. GAMLSS models everything. insurance-distributional-glm is the first production-ready Python implementation of Generalised Additive Model...
A GLM that prices your motor book correctly on average will still misprice the tail. insurance-drn wraps any baseline model with a neural network that refine...
Raw experience rating tables are noisy at the edges -- young drivers, high-mileage vehicles, rare postcodes. Whittaker-Henderson smoothing is the actuarial s...
How to go from a fitted CatBoost frequency model to BYM2 spatial territory factors for Emblem or Radar - with the actual data engineering, convergence checks...
How to handle 800+ vehicle makes and 9,000+ postcode sectors in a standard GLM without sacrificing interpretability. A four-phase pipeline that uses neural n...
Generic synthetic data tools — SDV, CTGAN, TVAE — produce portfolios that look plausible column-by-column and break down as soon as you run a pricing model o...
A Poisson z-test tells you if your totals balance. An auto-calibration test tells you if each price cohort is self-financing. The Murphy decomposition tells ...
interpretML's ExplainableBoostingMachine already handles Poisson/Gamma/Tweedie loss and exposure offsets natively. What it doesn't do is wrap those capabilit...
Where double machine learning outperforms naive regression in insurance pricing -- and where it doesn't. Practical benchmarks on synthetic motor data.
A GLM gives you technically adequate prices. Constrained optimisation finds the factor adjustments that hit your LR target, satisfy FCA PS21/5 ENBP, respect ...
CatBoost's MultiQuantile loss gives you quantile predictions. It does not give you TVaR, large loss loadings, ILF curves, or exceedance probabilities. insura...
Applying rate changes without solving the demand-constraint system simultaneously is a guaranteed route to suboptimal profit. insurance-optimise is an open-s...
When exp(beta) from a GLM is not what you think it is. How omitted variable bias and confounding distort rating factor estimates, and how Double Machine Lear...
Two risks with identical expected loss can have wildly different variance. Standard Tweedie GBMs cannot tell them apart. insurance-distributional — the first...
How to detect and correct proxy discrimination in UK insurance pricing models. Using SHAP and the insurance-fairness library to identify protected characteri...
PSI and aggregate A/E are not enough. A three-layer monitoring framework - feature drift, segmented calibration, and a formal Gini test - that tells you whet...
Naive price elasticity estimates from insurance quote data are biased - risk drives both premium and lapse. The insurance-demand library implements Double Ma...
An open-source Python library that distils GBM models into multiplicative GLM factor tables for Radar, Emblem, and other rating engines. The first open-sourc...
UK motor GLMs test a handful of interactions out of hundreds of possible pairs. insurance-interactions automates the search using CANN residual modelling, Ne...
A Python library for NCD/bonus-malus systems in UK motor insurance. Includes the non-obvious finding that optimal NCD claiming thresholds peak at 20% NCD, no...
How to build a demand model for UK personal lines pricing: conversion, retention, price elasticity, and demand curves. Covers FCA GIPP requirements and the t...
GLM coefficients measure association, not causation. How Double Machine Learning isolates the causal effect of rating factors from confounding, and why this ...
Standard k-fold cross-validation is wrong for insurance pricing models. How temporal leakage and IBNR contamination inflate CV scores, and how walk-forward v...
Why postcode sector k-means banding is statistically wrong for territory ratemaking, and how to use the BYM2 spatial model in PyMC to borrow strength across ...
A complete Databricks workflow for UK pricing actuaries: CatBoost training with MLflow tracking, SHAP relativities extraction, and export to Radar. End-to-en...
How to build a rate change that meets a target loss ratio, respects movement caps, and minimises cross-subsidy simultaneously. Linear programming for UK pers...
Conformal prediction intervals for insurance GBMs that are statistically honest. Distribution-free coverage guarantees for individual risk predictions, not c...
Buhlmann-Straub credibility in Python for blending thin segment experience with portfolio rates. Covers the mathematics, its equivalence to mixed models, and...
How to extract multiplicative rating relativities from CatBoost GBMs using SHAP values - the same format as exp(beta) from a GLM, with confidence intervals, ...
Partial pooling for thin rating cells in UK motor pricing. How the bayesian-pricing library uses hierarchical Bayesian models to stabilise sparse segments wi...