Deep dives into individual pricing methods — DML, conformal prediction, credibility, copulas, spatial models, GAMs, normalising flows, and more. Each post covers the methodology, its assumptions, and what it gets wrong in standard practice.
The Python equivalent of the IFoA MLR Working Party's R tutorial: Poisson GLM baseline, EBM GAM, and CatBoost GBM on UK motor data, with the full pipeline from data to governance.
Benchmark results on a known-DGP synthetic UK motor book. EBM beats the GLM by 35 Gini points. But the deviance number is misleading. We explain why, and when you should care.
Benchmark results on a known-DGP synthetic UK motor fleet. HMM state fractions deliver 5–10pp Gini lift over simple aggregates. State classification recovers >50% of true high-r...
insurance-deploy provides the champion/challenger infrastructure, audit trail, and ICOBS 6B compliance tooling that MLflow does not. Here is how to use it.
Reproduce an Emblem frequency-severity GLM in Python: factor tables, one-way plots, deviance residuals, and lift charts using statsmodels, CatBoost, and Polars.
Which GLM assumptions actually matter for insurance pricing, which ones you routinely violate without consequence, and the diagnostics worth running before signing off a product...
A practical walkthrough for pricing analysts: use insurance-causal for causal inference, insurance-conformal for prediction intervals, and insurance-monitoring for drift detecti...
Benchmark results on a known-DGP synthetic UK motor age curve. REML recovers the true frequency well in the data-rich middle. The tails are a different story. Numbers, not claims.
We read the source, ran the benchmark, and checked the claim: the independence assumption in standard two-part GLMs is wrong for UK motor, and this library corrects it analytica...
We ran the benchmarks. On a synthetic UK motor book with nonlinear confounding, naive logistic GLM overestimates the telematics treatment effect by 50–90%. DML recovers the grou...
Benchmark results on a known-DGP synthetic motor book. Conformal hits 90% across all deciles. Parametric Tweedie under-covers the top decile by 10–15pp. Numbers, not theory.
Benchmark results on 100 synthetic schemes with known true loss rates. Credibility blending reduces MSE by 25–35% vs the best naive alternative. Numbers, not theory.
CatBoost motor frequency model to Radar factor table in 45 minutes: SHAP relativity extraction, GLM distillation, and Radar-compatible CSV export in Python.
Build a burning cost model in Python: frequency-severity split, exposure offsets, large loss capping, IBNR adjustment, and combined pure premium for UK pricing.
Mutual information, proxy R-squared, and SHAP proxy scores all flag proxy discrimination but catch different things. A practical guide to interpreting conflicting signals in ins...
sklearn's TweedieRegressor is a well-engineered GLM. It fits a fixed-power Tweedie model correctly. The problem is that insurance pricing needs per-risk variance, not a single p...
MAPIE is the standard Python library for conformal prediction, but it wasn't designed for insurance. Here is what goes wrong with exposure-weighted portfolios and Tweedie models...
Tutorial on monitoring insurance pricing models using actuarial KPIs. Gini tracking, segmented A/E, double-lift for champion/challenger. Why generic drift tools miss what matters.
EP25/2 requires non-life insurers to test whether rating factors act as proxies for protected characteristics. Here is exactly how to run that test in Python, with the correct s...
EconML is the standard Python library for causal ML. It was not built for insurance pricing, Poisson/Gamma exposure models, or the dual-selection bias problems specific to renew...
DoWhy is the most rigorous general-purpose causal inference library in Python — DAG specification, formal identification, refutation tests. It was not built for insurance pricin...
Build a CatBoost frequency-severity pricing model on freMTPL2 using Polars. Poisson frequency, Gamma severity, combined burning cost, SHAP factor extraction, and distillation to...
Step-by-step CatBoost frequency-severity model on freMTPL2. Poisson + Gamma, burning cost combination, SHAP factor tables, and GLM distillation for Radar.
insurance-causal v0.3.1 fixes over-partialling in DML for small insurance books. Adaptive CatBoost regularisation makes causal estimates reliable at n≥1k.
Static credibility weights all years equally. The dynamic Poisson-gamma state-space model weights recent experience more - and quantifies how much more.
GLMComparison and MonotonicityEditor in insurance-gam close the governance gap between EBM shape functions and the GLM factor table your pricing committee...
How to run covariate shift detection as a recurring monthly check: monitoring cadence, ESS ratio trends, and the thresholds that trigger a retraining...
Three interpretable architectures for UK insurance pricing: EBM, ANAM, and PIN via insurance-gam. Refuse the GLM-vs-GBM accuracy trade-off with factor tables.
CausalForestDML separates causal price effect from risk-lapse correlation in UK motor renewal. insurance-elasticity - per-customer CATE and ENBP optimiser.
Two-stage CatBoost plus REML random effects for UK insurance broker adjustments. insurance-multilevel - Buhlmann-Straub credibility weighting, not guesswork.
Continuous-time HMM for telematics risk scoring in UK motor pricing. Latent driving regimes from GPS data - actuarially interpretable features for Poisson GLM.
TabPFN and TabICLv2 for thin-segment UK insurance pricing. In-context learning at inference, no gradient descent. insurance-thin-data wraps both for actuaries.
Joint conformal prediction sets for frequency and severity in UK insurance. Fan and Sesia coordinate-wise standardization - simultaneous coverage across both.
Shared-trunk neural model for frequency-severity dependence in UK motor pricing. Explicit dependence testing where two-part GLMs assume independence - Python.
Correct covariate shift when acquiring an MGA book for UK motor pricing. Importance weighting, density ratio estimation, segment-level diagnostics - Python.
Conformal risk control for UK insurance: coverage calibrated to financial shortfall, not miscoverage rate. insurance-conformal - beyond standard intervals.
Logistic regression treats all non-lapsers the same. Mixture cure models split them into two groups: structural non-lapsers who will never leave, and...
How to convert raw telematics trip data into GLM-ready features for UK motor pricing. Covers HMM state segmentation and score calibration to GLM relativities.
Doubly robust TMLE for insurance pricing with Poisson outcomes and exposure offsets. insurance-tmle - first Python library with the implementation AIPW lacks.
Neural Spline Flows for bimodal UK motor BI severity - no family assumption. insurance-nflow: TVaR, ILF curves, reinsurance layer costs, fat-tail transform.
Joint longitudinal-survival model for telematics: driving trajectory not current score. insurance-jlm - Wulfsohn-Tsiatis SREM with mid-term repricing in Python.
GARCH for UK insurance claims inflation: time-varying variance in trend analysis. insurance-garch - Engle (1982) applied to actuarial trend and pricing models.
Vine copulas for multi-peril UK home pricing. Flood-subsidence correlation costs ~9% in mispriced revenue. insurance-copula: BIC selection, PML simulation.
Fine-Gray subdistribution hazard for UK insurance competing risks. Separates lapse, MTC, and NTU correctly - insurance-survival Python, not naive censoring.
Causal Forests with Fixed Effects for UK insurance panel data. Rate change evaluation by segment - beyond before-and-after loss ratios. causalfe Python.
Bayesian Causal Forests for heterogeneous lapse effects in UK insurance pricing. Segment-level elasticity with posteriors - insurance-bcf wrapping stochtree.
Automatic Debiased ML via Riesz Representers for continuous price elasticity. insurance-causal - no GPS density blow-up at tails. UK personal lines Python.
Transfer learning for thin-segment UK insurance pricing: Tian-Feng GLM algorithm, CatBoost source-as-offset, CANN fine-tuning, negative transfer diagnostics.
Survival models for UK personal lines retention: cure models, survival-adjusted CLV, actuarial lapse tables, MLflow deployment. What lifelines does not do.
Regression Discontinuity Design tests if UK motor risk drops at age 25. Exposure-weighted Poisson outcomes, geographic boundaries, Consumer Duty output.
Double GLM gives every UK insurance policy its own dispersion parameter. insurance-dispersion - policy-level Solvency II variance and risk-adequate loading.
insurance-causal v0.3.1 fixes over-partialling in DML for small insurance books. Adaptive CatBoost regularisation makes causal estimates reliable at n≥1k.
Standard Gamma GLMs assign one dispersion parameter to every policy. That is wrong for most UK books. GAMLSS sigma submodels and Tweedie p estimation fix it.
GLMTransfer borrows statistical strength from a related source book to price thin target segments. Motor-to-fleet, home-to-landlord, and fleet roll-outs.
GAMLSS in Python: seven families, RS algorithm, variance as function of covariates. insurance-distributional-glm - the actuarial implementation Python lacked.
Distributional Refinement Networks wrap any GLM to produce a full predictive distribution. insurance-severity - neural severity modelling for UK motor pricing.
Handle 800+ vehicle makes and 9,000+ postcode sectors in a multiplicative GLM using neural embeddings and spatial clustering. Auditable Python pipeline.
Actuarially faithful synthetic data via vine copulas and AIC-selected marginals. insurance-synthetic fixes Poisson semantics and tail behaviour SDV gets wrong.
CatBoost MultiQuantile plus actuarial output layer: TVaR, ILFs, large loss loadings, exceedance probabilities for UK insurance pricing. insurance-quantile.
Static credibility weights all years equally. The dynamic Poisson-gamma state-space model weights recent experience more - and quantifies how much more.
ICC diagnostics for multiple group factors in insurance pricing. When broker, scheme, fleet, and postcode sector effects are worth modelling with REML...
insurance-distributional models the full conditional loss distribution, not just the mean. First open-source Python implementation of the ASTIN 2024 Best Paper.
Per-risk large loss loadings for UK home insurance using quantile GBMs. Avoids the flat-loading trap by making the loading a function of the risk itself.
Step-by-step tutorial: plant two interactions in synthetic motor data, detect them with CANN + NID, validate with SHAP, confirm with A/E surfaces, and...
Detect and correct proxy discrimination in UK insurance using SHAP and insurance-fairness. Protected characteristic leakage detection under FCA Consumer Duty.
The foundational walkthrough for insurance-covariate-shift: density ratio estimation, ESS/KL diagnostics, importance weighting, shift-robust conformal...
Python library distilling CatBoost GBMs into multiplicative GLM factor tables for Radar and Emblem. Open-source GBM-to-GLM distillation for UK pricing teams.
Bühlmann-Straub vs CatBoost vs two-stage multilevel for UK motor pricing: when each wins and how insurance-credibility and insurance-multilevel combine them.
Assumes familiarity with the Murphy decomposition framework. Focuses on the operational question: given a monitoring alert, how do you read GMCB vs LMCB...
Automated interaction search for UK motor GLMs using CANN residuals and NID. Bonferroni-corrected shortlist before manual testing - insurance-interactions.
BYM2 spatial model in PyMC for UK territory ratemaking. Borrows strength across neighbouring postcode sectors - statistically correct vs k-means banding.
Distribution-free conformal prediction intervals for insurance GBMs. Per-risk coverage guarantees, not confidence intervals for the mean. Python library.
Buhlmann-Straub credibility in Python for UK personal lines. Blend thin-segment experience with portfolio rates - mathematically equivalent to mixed models.
Extract multiplicative rating relativities from CatBoost using SHAP - same exp(beta) format as a GLM. UK personal lines Python with confidence intervals.
Partial pooling for thin rating cells in UK motor pricing. bayesian-pricing stabilises sparse segments with hierarchical Bayesian models - no data discarded.
GLMComparison and MonotonicityEditor in insurance-gam close the governance gap between EBM shape functions and the GLM factor table your pricing committee...
OLS elasticity in formula-rated books is contaminated by your own risk model. insurance-elasticity fixes this with CausalForestDML and CatBoost nuisance.
How to run covariate shift detection as a recurring monthly check: monitoring cadence, ESS ratio trends, and the thresholds that trigger a retraining...
Logistic regression treats all non-lapsers the same. Mixture cure models split them into two groups: structural non-lapsers who will never leave, and...
How to convert raw telematics trip data into GLM-ready features for UK motor pricing. Covers HMM state segmentation and score calibration to GLM relativities.
Standard Gamma GLMs assign one dispersion parameter to every policy. That is wrong for most UK books. GAMLSS sigma submodels and Tweedie p estimation fix it.
GLMTransfer borrows statistical strength from a related source book to price thin target segments. Motor-to-fleet, home-to-landlord, and fleet roll-outs.
ICC diagnostics for multiple group factors in insurance pricing. When broker, scheme, fleet, and postcode sector effects are worth modelling with REML...