Correlation is cheap in insurance data. Every rating factor correlates with every other rating factor, and untangling which of them causes risk versus which merely predicts risk is not a statistical exercise you can do with a GLM. Double machine learning (DML) lets you estimate the causal effect of a specific factor — telematics score, NCD level, vehicle group — while flexibly controlling for confounders using a GBM. The result is a deconfounded coefficient with a standard error you can actually interpret.
The application that drives most adoption is price elasticity for PS21/5 compliance. OLS elasticity on formula-rated renewal data does not give you what the FCA needs. The premium affects who lapses, and lapse propensity is correlated with risk — so OLS conflates demand response with adverse selection. CausalForestDML separates these and gives you heterogeneous treatment effects: the elasticity varies by customer segment, which is exactly the information a pricing actuary needs to set a compliant renewal uplift.
A third application is rate change evaluation. Standard before/after analysis of a rate change cannot isolate the premium increase from everything else that was changing at the same time. Synthetic difference-in-differences identifies a control group from your own book — policies that received a different rate movement — and estimates the causal impact of the treated cohort’s uplift on lapses, conversions, and loss ratio.
Library: insurance-causal on GitHub · pip install insurance-causal
Tutorials and introductions
- Your Elasticity Estimate Is Biased and You Already Know Why — the foundational tutorial: DML price elasticity on a UK renewal book, with FCA PS21/5 compliance framing
- Causal AI for Pricing Actuaries: A Practical Guide — broader introduction to causal methods for pricing teams: DML, causal forests, DiD, interrupted time series
- OLS Elasticity in a Formula-Rated Book Measures the Wrong Thing — why naive elasticity estimation fails and how DML fixes it
Techniques and extensions
- DML Works at 1,000 Policies Now. Here Is What Changed. — adaptive regularisation for thin segments where standard DML overfits the nuisance models
- Your Pricing Model Knows the Average Effect. That Is Not Enough. — causal forests for heterogeneous treatment effects across the portfolio
- Your Pricing Model Knows the Average. Your Customers Don’t Care About the Average. — GATES and CLAN for characterising which segments are most price-sensitive
- Heterogeneous Lapse Effects with Bayesian Causal Forests: Beyond the Average Elasticity — BCF with BART for credible intervals on heterogeneous effects in small samples
- Rate Change Evaluation: Did the Premium Increase Cause the Lapses? — applying DiD to lapse attribution after a renewal rate change
- Synthetic Difference-in-Differences for Rate Change Evaluation — SDiD using
insurance-causal-policyfor FCA evidence packs
Benchmarks and validation
- Double Machine Learning for Insurance Pricing: Benchmarks and Pitfalls — simulation results on synthetic UK motor data with known treatment effects
- Does DML causal inference actually work for insurance pricing? — empirical validation of DML on realistic insurance data
- Does DML Causal Inference Actually Work for Insurance Pricing? — extended benchmark with GLM comparison
Library comparisons
- DoWhy vs insurance-causal: Which Causal Inference Library Should Insurers Use? — DoWhy’s DAG-based approach versus DML for insurance pricing tasks
- EconML vs insurance-causal: Causal Inference for Insurance Pricing — EconML is the upstream dependency; this explains what insurance-causal adds on top