Testing models before they go to production. Gini, double-lift, calibration curves, out-of-time validation, and the specific failure modes that standard ML metrics miss on insurance data. 11 articles.
We planted three simultaneous model failures in a 50,000-policy UK motor book. The aggregate A/E never triggered. The library detected the first problem after 1,500 policies. He...
Parametric Tweedie intervals undercover high-risk policies by 10–15 percentage points. We tested conformal prediction on 50,000 UK motor policies to find out whether the fix act...
We ran Double Machine Learning against a naive GLM on a 50,000-policy UK motor telematics book. The GLM overestimated the treatment effect by 50–90%. Here is what that means for...
Benchmark results on a known-DGP synthetic UK motor book. EBM beats the GLM by 35 Gini points. But the deviance number is misleading. We explain why, and when you should care.
Benchmark results on a known-DGP synthetic UK motor fleet. HMM state fractions deliver 5–10pp Gini lift over simple aggregates. State classification recovers >50% of true high-r...
Honest benchmark: does fitting a surrogate GLM on CatBoost pseudo-predictions recover more discriminatory power than a direct GLM? We test it on 30,000 synthetic UK motor policies.
Benchmark results on a known-DGP synthetic UK motor age curve. REML recovers the true frequency well in the data-rich middle. The tails are a different story. Numbers, not claims.
We read the source, ran the benchmark, and checked the claim: the independence assumption in standard two-part GLMs is wrong for UK motor, and this library corrects it analytica...
Aggregate A/E at 0.94 looks fine. The model has been mispricing under-25s for eight months. Benchmark results on a synthetic UK motor book with three planted failure modes.
We ran the benchmarks. On a synthetic UK motor book with nonlinear confounding, naive logistic GLM overestimates the telematics treatment effect by 50–90%. DML recovers the grou...
Benchmark results on a known-DGP synthetic motor book. Conformal hits 90% across all deciles. Parametric Tweedie under-covers the top decile by 10–15pp. Numbers, not theory.
Benchmark results on 100 synthetic schemes with known true loss rates. Credibility blending reduces MSE by 25–35% vs the best naive alternative. Numbers, not theory.