Burning Cost - ML and data science research in UK personal lines insurance

Your GBM outperforms.
Your GLM is still live.

Ten libraries for the hard problems in UK pricing. Free, MIT-licensed, Databricks-native. All libraries also work standalone via pip — Databricks is optional.

shap_relativities_demo.py

Built for practitioners, used by practitioners

25,000 downloads in March 2026 across 34 libraries

Real adoption from pricing teams — each download is a pip install on someone's work machine or Databricks cluster. Without-mirrors counts only, stripping CI bots and PyPI mirrors.

Library Downloads / month

insurance-causal

2,003

insurance-fairness

1,992

insurance-monitoring

1,622

insurance-causal-policy

1,495

insurance-gam

1,120

insurance-optimise

1,099

insurance-conformal

1,010

insurance-quantile

968

insurance-credibility

909

insurance-severity

846

Top 10 total 13,064 / month

Five libraries, one dataset, one notebook

A runnable example that fits a CatBoost frequency model, extracts SHAP factor tables, runs a proxy discrimination audit, monitors for drift, and attaches conformal prediction intervals — all on the same synthetic UK motor dataset. Opens in Google Colab with no local setup required.

The problem we solve

The missing piece is not technical skill. It is tooling that bridges the two.

Most UK pricing teams have adopted GBMs but are still taking GLM outputs to production. The GBM sits on a server outperforming the production model, but the outputs are not in a form that a rating engine, regulator, or pricing committee can work with. The model never makes it to rates.

Each library here solves one specific problem in the pricing workflow. Actuarial tests are included. Outputs use the formats pricing teams already recognise: factor tables, Lorenz curves, A/E ratios, movement-capped rate changes.

sklearn-compatible where it matters. Documented by people who have sat in the same sign-off meetings you have.

See it in practice

Three lines to a factor table. Five to validated splits.

Real API calls from the libraries. Not wrappers around wrappers. Each one does the specific thing a pricing team needs.

from shap_relativities import SHAPRelativities

sr = SHAPRelativities(model, X_train)
factors = sr.fit_transform(X_test)

# Returns multiplicative factor tables in GLM format
# Same structure as exp(beta) from your Emblem model
factors.head()
#  vehicle_age  relativity  ci_lower  ci_upper
#  0            1.000       0.982     1.018
#  1            0.912       0.901     0.923

#  3            0.793       0.780     0.807
#  4+           0.754       0.739     0.769
Factor tables, confidence intervals, exposure weighting, reconstruction validation. Output goes straight into a pricing committee pack.

from insurance_cv import InsuranceTemporalCV
from sklearn.model_selection import cross_val_score

cv = InsuranceTemporalCV(
    n_splits=5,
    ibnr_buffer_months=6
)
scores = cross_val_score(
    model, X, y,
    cv=cv,
    scoring="poisson_deviance"
)

# Walk-forward splits - no future data leaks into training folds
# IBNR buffer prevents immature periods contaminating validation
print(f"CV deviance: {scores.mean():.4f} ± {scores.std():.4f}")
Walk-forward splits with configurable IBNR buffers. Temporally correct: no future data leaks into training folds. sklearn-compatible API.

from insurance_optimise import RateOptimiser

opt = RateOptimiser(
    current_rates,
    technical_rates,
    exposure
)
result = opt.optimise(
    max_movement=0.10,
    target_lr_improvement=0.03
)

# Efficient frontier as a linear programme
# Respects ±10% movement cap per segment
print(f"LR improvement: {result.lr_delta:.1%}")
# LR improvement: 2.8%  (within movement constraints)
Formulates the efficient frontier as a linear programme. Respects movement caps per segment, targets aggregate loss ratio improvement.

Who this is for

Built for people who know the problem from the inside

These libraries assume you understand insurance pricing. They do not explain what a GLM is.

Pricing actuaries moving from Emblem or Radar to Python

You know the techniques. These libraries give you Python equivalents that produce outputs in the same formats you already use: factor tables, A/E ratios, Lorenz curves.

Data scientists joining an insurance pricing team

You have the ML skills but lack the actuarial context. These libraries encode that context: correct cross-validation for IBNR, credibility-weighted factors, fairness tests that map to FCA requirements.

Pricing managers evaluating modern tooling

You need to know what is production-ready and what is a research prototype. Each library here has actuarial tests, a clear scope, and outputs a pricing team lead can explain to a committee.

Academic researchers working on insurance pricing methods

We implement recent literature: Manna et al. (2025) on conformal prediction, BYM2 spatial models, variance-weighted non-conformity scores. Reproducible, documented, testable.

Ten tools for the problems that matter most

These are the libraries we think are genuinely differentiated. Each addresses a specific hard problem in UK pricing — regulatory compliance, causal inference, uncertainty quantification, smoothing — where no adequate open-source Python tooling existed before. The full portfolio of 34 libraries is below.

From the blog

Practitioner articles on insurance pricing

04 Apr 2026

Five Libraries, One Pipeline: End-to-End Motor Pricing in Python

A single freMTPL2 motor pipeline running through insurance-gam, insurance-conformal, insurance-monitoring, insurance-fairness, and insurance-governance. No other open-source ecosystem does all five.

Read article →

27 Mar 2026

Open-Source Python Tools for Insurance Pricing: What's Actually Available in 2026

A definitive survey of open-source Python tools for insurance pricing in 2026. General-purpose ML libraries, specialist actuarial packages, the Burning Cost stack, and honest gaps. The post a pricing actuary bookmarks.

Read article →

25 Mar 2026

Consumer Duty Fair Value Evidencing: A 12-Step Technical Checklist for Pricing Actuaries (2026)

EP25/2 (the FCA's evaluation of GIPP price-walking remedies) flags ongoing fair value supervision in motor and home. No single technical checklist exists for the pricing actuary's portion of the annual fair value assessment. Here is one.

Read article →

20 Mar 2026

FCA Consumer Duty Pricing Fairness in Python

The FCA expects pricing teams to demonstrate their models don't proxy-discriminate under Consumer Duty. Most teams do this in Excel. Here is how to do it properly in Python, using insurance-fairness.

Read article →

20 Mar 2026

Fairlearn vs insurance-fairness: Why Generic ML Fairness Tools Miss What the FCA Cares About

Fairlearn is excellent for classification fairness. It was not built for insurance pricing, the Equality Act 2010, or the FCA's specific concern: proxy discrimination in a multiplicative rating model. Here is what the difference means in practice.

Read article →

09 Mar 2026

Double Machine Learning for Insurance Pricing: Benchmarks and Pitfalls

Where double machine learning beats naive regression for insurance pricing — and where it does not. Benchmarks on 100,000-policy synthetic UK motor data with known ground truth. DML via insurance-causal.

Read article →

03 Mar 2026

Three-Layer Drift Detection: What PSI and A&E Ratios Miss

Three-layer drift detection: feature drift, segmented calibration, Gini test. Tells you whether to recalibrate or refit - beyond PSI and A/E ratios.

Read article →

01 Mar 2026

From CatBoost to Radar in 50 Lines of Python

Python library distilling CatBoost GBMs into multiplicative GLM factor tables for Radar and Emblem. Open-source GBM-to-GLM distillation for UK pricing teams.

Read article →

28 Feb 2026

Blending GLMs and GBMs for UK Pricing: Cross-Validated Weights, Not a Choice Between Them

How to combine GLM and GBM predictions for production pricing: cross-validated blend weights, PRA interpretability, and when blending actually helps. Once the blended model is validated, document it in [`insurance-governance`](/insurance-governance/) for SoP3/24 audit trail.

Read article →

All articles →

Your GBM outperforms.
Your GLM is still live.

25,000 downloads in March 2026 across 34 libraries

Five libraries, one dataset, one notebook

The missing piece is not technical skill. It is tooling that bridges the two.

Three lines to a factor table. Five to validated splits.

Built for people who know the problem from the inside

Ten tools for the problems that matter most

The complete pricing workflow, covered

Start here

Insurance Pricing in Python — 12-module training course

Practitioner articles on insurance pricing

Your GBM outperforms.Your GLM is still live.

25,000 downloads in March 2026 across 34 libraries

Five libraries, one dataset, one notebook

The missing piece is not technical skill. It is tooling that bridges the two.

Three lines to a factor table. Five to validated splits.

Built for people who know the problem from the inside

Ten tools for the problems that matter most

The complete pricing workflow, covered

Start here

Insurance Pricing in Python — 12-module training course

Practitioner articles on insurance pricing

Your GBM outperforms.
Your GLM is still live.