Burning Cost is on the forefront of machine learning and data science research in UK personal lines insurance. We help pricing teams adopt best practice, best-in-class tooling, and Databricks.
The name comes from a basic actuarial concept: burning cost is claims incurred divided by premium earned. Simple, direct, no mystification. That is how we think about tooling.
What we have built
Fifty Python libraries covering the full pricing workflow. See the full library index with pip install commands.
UK pricing teams have adopted GBMs (CatBoost is now the dominant choice for most new builds) but many are still taking GLM outputs to production because the GBM outputs are not in a form that rating engines, regulators, or pricing committees can work with. The tools here are about closing that gap — from raw data through to a signed-off rate change with an audit trail. All of it runs on Databricks.
Data & Validation
insurance-cv- temporal walk-forward cross-validation with IBNR buffers and sklearn-compatible scorersinsurance-datasets- synthetic UK motor data with a known data-generating process, for testing and teachinginsurance-synthetic- vine copula synthetic portfolio generation preserving multivariate dependence structureinsurance-conformal- distribution-free prediction intervals for insurance GBMs with finite-sample coverage guaranteesinsurance-monitoring- exposure-weighted PSI/CSI, actual-vs-expected ratios, and Gini drift z-tests for deployed modelsinsurance-validation- structured PRA SS1/23 model validation reports covering nine required sections, output as HTML and JSON
Model Building
credibility- Buhlmann-Straub credibility in Python with mixed-model equivalence checksbayesian-pricing- hierarchical Bayesian models for thin-data pricing segments using PyMC 5insurance-spatial- BYM2 spatial models for postcode-level territory ratemaking, borrowing strength from neighboursinsurance-multilevel- CatBoost combined with REML random effects for high-cardinality categorical groupsinsurance-trend- loss cost trend analysis with structural break detection and regime-aware projectionsinsurance-anam- actuarial neural additive model in PyTorch: interpretable deep learning for pricinginsurance-interactions- automated GLM interaction detection using CANN, NID, and SHAP-based methods
Interpretation
shap-relativities- multiplicative rating factor tables from CatBoost models via SHAP, in the same format as exp(beta) from a GLMinsurance-causal- causal inference via double machine learning for deconfounding rating factors
Tail Risk & Distributions
insurance-distributional-glm- GAMLSS for Python: model ALL distribution parameters as functions of covariates, seven families, RS algorithminsurance-drn- Distributional Refinement Networks: refine a GLM/GBM baseline distribution covariate-by-covariate using a neural networkinsurance-quantile- quantile and expectile GBMs for tail risk, TVaR, and increased limit factorsinsurance-distributional- distributional GBMs with Tweedie, Gamma, ZIP, and negative binomial objectives
Commercial
rate-optimiser- constrained rate change optimisation with efficient frontier between loss ratio target and movement cap constraintsinsurance-demand- conversion, retention, and DML price elasticity modelling integrated with rate optimisationinsurance-elasticity- causal price elasticity estimation via CausalForestDML and DR-Learnerinsurance-optimise- SLSQP portfolio rate optimisation with analytical Jacobians for large factor spacesexperience-rating- NCD and bonus-malus systems for UK motor, including claiming threshold optimisationinsurance-survival- cure models, customer lifetime value, lapse tables, and MLflow wrapper for retention modelling
Compliance & Governance
insurance-fairness- proxy discrimination auditing and FCA Consumer Duty documentation supportinsurance-fairness-ot- optimal transport discrimination-free pricing via Lindholm marginalisation, causal path decomposition, and Wasserstein barycenter and FCA Consumer Duty documentation supportinsurance-causal-policy- synthetic difference-in-differences for causal rate change evaluation and FCA evidence packsinsurance-mrm- model risk management: ModelCard, ModelInventory, and GovernanceReport generationinsurance-deploy- champion/challenger framework with shadow mode, rollback, and full audit trail
Infrastructure
burning-cost- the Burning Cost CLI; orchestration for pricing model pipelines
The problem we are solving
UK pricing teams have been building GBMs for years, mostly CatBoost. The models are better than the production GLMs. But many teams are still taking the GLM to production, because the GBM outputs are not in a form that a rating engine, regulator, or pricing committee can work with.
The issue is not technical skill. It is tooling. There is no standard Python library that extracts a multiplicative relativities table from a GBM. There is no standard library that does temporally-correct walk-forward cross-validation with IBNR buffers. There is no standard library that builds a constrained rate optimisation a pricing actuary can challenge. There is no standard library that generates a PRA SS1/23-compliant model validation report.
We wrote those libraries because we needed them. Then we kept going. Everything is built to run on Databricks — that is where UK pricing teams are working, and where our research demonstrates its best practice.
Training course
We also run a free training course — Modern Insurance Pricing with Python and Databricks — for pricing actuaries and analysts who want to use these tools properly. Twelve modules, written from first principles for insurance, not adapted from generic data science tutorials.
Contact
Email: pricing.frontier@gmail.com
GitHub: github.com/burning-cost