Correlation is cheap in insurance data. Every rating factor correlates with every other rating factor, and untangling which of them causes risk versus which merely predicts risk is not a statistical exercise you can do with a GLM. Double machine learning (DML) lets you estimate the causal effect of a specific factor — telematics score, NCD level, vehicle group — while flexibly controlling for confounders using a GBM. The result is a deconfounded coefficient with a standard error you can actually interpret.

The application that drives most adoption is price elasticity for PS21/5 compliance. OLS elasticity on formula-rated renewal data does not give you what the FCA needs. The premium affects who lapses, and lapse propensity is correlated with risk — so OLS conflates demand response with adverse selection. CausalForestDML separates these and gives you heterogeneous treatment effects: the elasticity varies by customer segment, which is exactly the information a pricing actuary needs to set a compliant renewal uplift.

A third application is rate change evaluation. Standard before/after analysis of a rate change cannot isolate the premium increase from everything else that was changing at the same time. Synthetic difference-in-differences identifies a control group from your own book — policies that received a different rate movement — and estimates the causal impact of the treated cohort’s uplift on lapses, conversions, and loss ratio.

Library: insurance-causal on GitHub · pip install insurance-causal


Tutorials and introductions


Techniques and extensions


Benchmarks and validation


Library comparisons