Validation

42 articles in this topic

04 Apr 2026

Is Your Model Improvement Worth Building? The Loss Ratio Error Framework

C. Evans Hedges (Lemonade, December 2025) derives the first closed-form formula connecting model discrimination to expected loss ratio. LRE translates a correlation improvement ...
03 Apr 2026

PAVA in Three Places: Isotonic Regression for Insurance Pricing

The Pool Adjacent Violators Algorithm solves an O(N) monotonicity problem with no parametric assumptions. It appears in three distinct insurance pricing contexts: as the link fu...
03 Apr 2026

Validating a Mixture Severity Model: When NE-GMM Earns Its Keep and When GammaGBM Still Wins

NeuralGaussianMixture is now in insurance-distributional v0.4.0. The question is not whether it can fit bimodal severity — it can. The question is whether your data actually nee...
02 Apr 2026

Can LLMs Pass Their Insurance Exams? The Wrong Question for Pricing Teams

Beauchemin & Khoury (arXiv:2603.07825) benchmark 51 LLMs on Quebec insurance regulatory certification questions. Passing insurance exams is the wrong success metric for pricing ...
01 Apr 2026

TabPFN as a Conditional Density Estimator: What the Benchmark Actually Shows for Severity Pricing

Izbicki and Rodrigues (arXiv:2603.26611, March 2026) benchmark TabPFN-2.5, RealTabPFN-2.5 and TabICL-Quantiles as conditional density estimators across 39 datasets. The thin-dat...
01 Apr 2026

Two Things Random Splits and Pearson Correlation Get Wrong in Insurance Data

insurance-cv v0.3.0 adds SupportPointSplit (distributional train-test splitting via energy distance minimisation) and ChatterjeeSelector (nonlinear feature screening using Chatt...
31 Mar 2026

Tab-TRM: Best on the Benchmark, Not the Right Starting Point

Tab-TRM sets the French MTPL benchmark at 23.589×10⁻² Poisson deviance, beating PIN ensemble by 0.3%. The linearisation result — Tab-TRM is approximately a state-space model — i...
31 Mar 2026

Testing Conditional Coverage in Conformal Prediction — The ERT Diagnostic

Conformal prediction gives valid marginal coverage but says nothing about conditional coverage — your intervals can fail for young drivers or flood-zone properties while the por...
31 Mar 2026

Competing Risks Calibration: Why Your Fine-Gray Validation Is Wrong

D-calibration and ICI are mathematically invalid for competing-risks models. If F_k(inf|x) < 1 — which is always true for lapse, claim, and MTA competing causes — the probabilit...
28 Mar 2026

Tabular Foundation Models for Insurance Pricing — Do They Work?

An honest assessment of where tabular foundation models stand in March 2026 — what the benchmarks actually show, what's missing for insurance pricing, and which models are worth...
28 Mar 2026

TabPFN vs CatBoost vs GLM on freMTPL2: The Exposure Offset Problem

Three-way benchmark on 677K French motor policies. TabPFN cannot handle log-exposure offsets — the structural limitation that makes it unviable for bread-and-butter Poisson freq...
28 Mar 2026

What PRA SS1/23 Validation Looks Like on Real Data: 677K French Motor Policies

Most governance tooling is tested on toy examples with clean DGPs and inflated Gini coefficients. We ran the full insurance-governance validation suite on 677K freMTPL2 policies...
28 Mar 2026

Does Whittaker-Henderson Smoothing Actually Work for Insurance Pricing?

We benchmarked Whittaker-Henderson against raw rates and a 5-point weighted moving average on a synthetic UK motor driver age curve with known truth. W-H reduces MSE by 57.2% vs...
28 Mar 2026

Does Sarmanov Copula Frequency-Severity Modelling Actually Work?

The standard UK motor pricing formula multiplies E[N] by E[S] and assumes independence. On a 15,000-policy benchmark with planted omega=3.5, that assumption understates portfoli...
28 Mar 2026

Does PSI Actually Catch Pricing Model Drift?

PSI detects covariate shift but not rank collapse. On a synthetic UK motor book where a new risk factor emerges post-deployment, PSI stays GREEN while Gini drops 8 points. The B...
28 Mar 2026

Does Proxy Discrimination Testing Actually Work?

Manual Spearman correlation missed postcode as an ethnicity proxy in 100% of 50 benchmark runs. CatBoost proxy R-squared caught it in 100% of runs. The difference is the non-lin...
28 Mar 2026

Does Monotonicity-Constrained EBM Actually Work for Insurance Pricing?

On a UK motor DGP with a monotone young-driver requirement, unconstrained EBM violates monotonicity in 31% of runs. Constrained EBM matches GLM monotonicity compliance at 100% w...
28 Mar 2026

Does HMM Telematics Risk Scoring Actually Work for Insurance Pricing?

HMM-derived driving state features improve Gini by 5–10 percentage points over raw trip averages on a state-structured DGP. The reason is temporal: the HMM knows that aggressive...
28 Mar 2026

Does Constrained Rate Optimisation Actually Work?

We benchmarked constrained portfolio optimisation against a uniform +7% rate change on a 2,000-policy UK motor book. The optimiser achieved the same GWP target with £4,000–8,000...
28 Mar 2026

Does Bühlmann-Straub Credibility Actually Work?

We benchmarked Bühlmann-Straub credibility against raw experience and manual Z-factors on a 30-segment synthetic UK motor fleet book with a known DGP. On thin schemes, it reduce...
28 Mar 2026

Does Automatic Lambda Selection for Whittaker-Henderson Actually Work?

REML-selected lambda beats manual tuning on a 63-band age curve benchmark: 22% lower MSE on thin tail bands, zero analyst discretion, and principled credible intervals. The hone...
27 Mar 2026

Does Automated Model Monitoring Actually Work?

We planted three simultaneous model failures in a 50,000-policy UK motor book. The aggregate A/E never triggered. The library detected the first problem after 1,500 policies. He...
26 Mar 2026

Does Conformal Prediction Actually Work for Insurance Claims?

Parametric Tweedie intervals undercover high-risk policies by 10–15 percentage points. We tested conformal prediction on 50,000 UK motor policies to find out whether the fix act...
25 Mar 2026

How to Quantify What a Model Improvement Is Worth in Pounds

A 5pp Gini improvement means nothing to a CFO. The Loss Ratio Error framework from arXiv:2512.03242 converts model correlation into expected loss ratio — and from there into pou...
25 Mar 2026

Does DML Causal Inference Actually Work for Insurance Pricing?

We ran Double Machine Learning against a naive GLM on a 50,000-policy UK motor telematics book. The GLM overestimated the treatment effect by 50–90%. Here is what that means for...
24 Mar 2026

The Python Insurance Pricing Benchmark: GLM vs XGBoost vs CatBoost vs LightGBM on freMTPL2

Definitive Python benchmark: Poisson GLM vs XGBoost vs CatBoost vs LightGBM for insurance frequency modelling on freMTPL2. Poisson deviance, Gini coefficient, and A/E calibratio...
24 Mar 2026

Does insurance-gam actually work for insurance pricing?

Benchmark results on a known-DGP synthetic UK motor book. EBM beats the GLM by 12.6 Gini points (0.455 vs 0.329). But the deviance number is misleading. We explain why, and when...
24 Mar 2026

Does HMM telematics scoring actually work for insurance pricing?

Benchmark results on a known-DGP synthetic UK motor fleet. HMM state fractions deliver 5–10pp Gini lift over simple aggregates. State classification recovers >50% of true high-r...
24 Mar 2026

Does GBM-to-GLM Distillation Actually Work for Insurance Pricing?

Honest benchmark: does fitting a surrogate GLM on CatBoost pseudo-predictions recover more discriminatory power than a direct GLM? We test it on 30,000 synthetic UK motor policies.
23 Mar 2026

Exposure-Weighted Gini Coefficient in Python

Exposure-weighted Gini for insurance pricing: correct formula, Python implementation, and why ignoring exposure distorts motor model governance.
23 Mar 2026

Does Whittaker-Henderson smoothing actually work for insurance pricing?

Benchmark results on a known-DGP synthetic UK motor age curve. REML recovers the true frequency well in the data-rich middle. The tails are a different story. Numbers, not claims.
23 Mar 2026

Does automated model monitoring actually work for insurance pricing?

Aggregate A/E at 0.94 looks fine. The model has been mispricing under-25s for eight months. Benchmark results on a synthetic UK motor book with three planted failure modes.
23 Mar 2026

Does DML causal inference actually work for insurance pricing?

We ran the benchmarks. On a synthetic UK motor book with nonlinear confounding, naive logistic GLM overestimates the telematics treatment effect by 50–90%. DML recovers the grou...
23 Mar 2026

Does conformal prediction actually work for insurance pricing?

Benchmark results on a known-DGP synthetic motor book. Conformal hits 90% across all deciles. Parametric Tweedie under-covers the top decile by 10–15pp. Numbers, not theory.
23 Mar 2026

Does Bühlmann-Straub credibility actually work for insurance pricing?

Benchmark results on 100 synthetic schemes with known true loss rates. Credibility blending reduces MSE by 25–35% vs the best naive alternative. Numbers, not theory.
21 Mar 2026

Why k-Fold CV Is Wrong for Insurance and What to Do Instead

Insurance walk-forward cross-validation prevents the look-ahead bias that makes standard k-fold results useless for prospective evaluation. Complete Python example with insuranc...
13 Mar 2026

Foundation Models for Thin Segments: TabPFN and TabICLv2 in Insurance Pricing

TabPFN and TabICLv2 for thin-segment UK insurance pricing. In-context learning at inference, no gradient descent. insurance-thin-data wraps both for actuaries.
12 Mar 2026

GARCH for Claims Inflation: Modelling Volatility That Clusters

GARCH for UK insurance claims inflation: time-varying variance in trend analysis. insurance-garch - Engle (1982) applied to actuarial trend and pricing models.
09 Mar 2026

Double Machine Learning for Insurance Pricing: Benchmarks and Pitfalls

Where double machine learning beats naive regression for insurance pricing — and where it does not. Benchmarks on 100,000-policy synthetic UK motor data with known ground truth....
08 Mar 2026

How Do You Know Your Sigma Model Is Working?

Three diagnostics prove a GAMLSS sigma submodel is real: quantile residuals, worm plots, split-sample calibration. From insurance-distributional-glm.
06 Mar 2026

Density Ratio Detection for Channel Mix Drift: Correcting Predictions Before the Loss Ratio Reacts

When a new aggregator partnership or competitor exit changes your new business mix, models trained on the old distribution misprice silently.
13 Jul 2025

Your Model Validation Is a Checklist, Not a Test

PRA SS1/23 requires quantitative pass/fail tests, not narrative. insurance-governance automates the full validation suite and generates auditable HTML reports.