Empirical tests of whether specific methods actually deliver in insurance practice. The 'Does X actually work?' series runs each technique against held-out UK insurance datasets and reports Gini improvement, calibration, and failure modes honestly.
Benchmark results on a known-DGP synthetic UK motor book. EBM beats the GLM by 35 Gini points. But the deviance number is misleading. We explain why, and when you should care.
Benchmark results on a known-DGP synthetic UK motor fleet. HMM state fractions deliver 5–10pp Gini lift over simple aggregates. State classification recovers >50% of true high-r...
Honest benchmark: does fitting a surrogate GLM on CatBoost pseudo-predictions recover more discriminatory power than a direct GLM? We test it on 30,000 synthetic UK motor policies.
Benchmark results on synthetic UK motor renewal books. The constrained optimiser outperforms flat rate changes on profit and retention simultaneously. What it does not do: fix a...
Benchmark results on a known-DGP synthetic UK motor age curve. REML recovers the true frequency well in the data-rich middle. The tails are a different story. Numbers, not claims.
We read the source, ran the benchmark, and checked the claim: the independence assumption in standard two-part GLMs is wrong for UK motor, and this library corrects it analytica...
We ran the insurance-fairness proxy detection library against a synthetic motor book with planted proxy effects and compared it against the manual correlation check most teams a...
Aggregate A/E at 0.94 looks fine. The model has been mispricing under-25s for eight months. Benchmark results on a synthetic UK motor book with three planted failure modes.
We ran the benchmarks. On a synthetic UK motor book with nonlinear confounding, naive logistic GLM overestimates the telematics treatment effect by 50–90%. DML recovers the grou...
Benchmark results on a known-DGP synthetic motor book. Conformal hits 90% across all deciles. Parametric Tweedie under-covers the top decile by 10–15pp. Numbers, not theory.
Benchmark results on 100 synthetic schemes with known true loss rates. Credibility blending reduces MSE by 25–35% vs the best naive alternative. Numbers, not theory.
Insurance walk-forward cross-validation prevents the look-ahead bias that makes standard k-fold results useless for prospective evaluation. Complete Python example with insuranc...
TabPFN and TabICLv2 for thin-segment UK insurance pricing. In-context learning at inference, no gradient descent. insurance-thin-data wraps both for actuaries.
Correct covariate shift when acquiring an MGA book for UK motor pricing. Importance weighting, density ratio estimation, segment-level diagnostics - Python.
GARCH for UK insurance claims inflation: time-varying variance in trend analysis. insurance-garch - Engle (1982) applied to actuarial trend and pricing models.
PRA SS1/23 requires quantitative pass/fail tests, not narrative. insurance-governance automates the full validation suite and generates auditable HTML reports.
Standard k-fold CV is wrong for insurance pricing. Temporal leakage and IBNR contamination inflate scores. Walk-forward validation fixes both - Python.
PRA SS1/23 requires quantitative pass/fail tests, not narrative. insurance-governance automates the full validation suite and generates auditable HTML reports.