Open-Source Tools for Insurance Pricing in Python

The Python ecosystem for insurance pricing has grown substantially since 2022. This page lists the tools we consider worth knowing about — our own libraries alongside the broader open-source landscape. We include external tools where they are genuinely the best available option, and we note where gaps remain.

All tools listed are open-source and Python-based unless stated otherwise. Links go to GitHub repositories.

Rating Factor Analysis & Interpretability

Tools for extracting, inspecting, and communicating rating factors from pricing models.

Tool	Description
shap-relativities (Burning Cost)	Extracts GLM-style multiplicative rating relativities from CatBoost GBMs using SHAP values. Benchmarked at +2.85pp Gini lift over direct GLM on synthetic UK motor data.
insurance-distill (Burning Cost)	GBM-to-GLM distillation — fits a surrogate Poisson/Gamma GLM to GBM predictions and exports multiplicative factor tables for Radar/Emblem rating engines. 90–97% R² match on benchmarks.
insurance-gam (Burning Cost)	EBM and Neural Additive Model for interpretable pricing — shape functions per rating factor give the transparency of a GLM with GBM-level predictive power. Includes exact Shapley values and factor table output.
SHAP	The standard Python library for Shapley value explanations. Works with CatBoost, XGBoost, LightGBM, and scikit-learn models. The starting point before reaching for anything more specialised.
interpret	Microsoft’s Explainable Boosting Machine (EBM) implementation. GAM with automatic pairwise interaction detection. Directly usable in insurance pricing without a wrapper.
glum	QuantCo’s high-performance GLM library. The correct choice for Poisson/Gamma/Tweedie GLMs in Python — faster than statsmodels, proper exposure offsets, L1/L2/elastic-net regularisation, formula interface. ~130k PyPI downloads/month as of March 2026. v3.2.0 adds Polars support.
insurance-glm-tools (Burning Cost)	GLM tooling — nested GLM embeddings, R2VF factor level clustering, territory banding, SKATER spatial clustering. Complements glum rather than replacing it.

Fairness & Discrimination Testing

Tools for proxy discrimination auditing and algorithmic fairness in pricing.

UK context: the FCA’s Consumer Duty and Equality Act 2010 create specific obligations around indirect discrimination. Generic fairness libraries were built for binary classification without exposure weighting — they require adaptation for Poisson/Gamma pricing models.

Tool	Description
insurance-fairness (Burning Cost)	Proxy discrimination auditing aligned to FCA Consumer Duty and Equality Act 2010. Exposure-weighted bias metrics in multiplicative (log-space) models. Proxy R² method catches postcode proxies that Spearman correlation misses entirely (r²=0.78 vs r=0.06 in benchmarks).
Fairlearn	Microsoft’s fairness library. Covers demographic parity, equalized odds, and mitigation algorithms. Built for binary classification — the metrics require adaptation for insurance regression, and there is no exposure weighting. Useful as a secondary check on classification sub-models.
AIF360	IBM’s AI Fairness 360 toolkit. 70+ fairness metrics, pre/in/post-processing bias mitigations. Same caveat as Fairlearn: designed for classification. Documentation and maintenance quality has declined since 2023.

Causal Inference

Tools for moving beyond correlation — deconfounding rating factors, measuring price elasticity, and evaluating rate changes.

Tool	Description
insurance-causal (Burning Cost)	Double machine learning (DML) for deconfounding rating factors, plus causal forest for heterogeneous treatment effects. Use DML for portfolio-level average effects; causal forest for segment-level CATEs with n≥2,000 per group. Includes price elasticity estimation.
insurance-causal-policy (Burning Cost)	Synthetic difference-in-differences for evaluating rate changes — event study, HonestDiD sensitivity bounds, FCA evidence pack output. v0.2.0 adds DoublyRobustSCEstimator: 24% lower RMSE than SDID with few comparison groups.
EconML	Microsoft’s econometric causal ML library. The reference implementation of double machine learning, causal forests, and IV methods. insurance-causal builds on EconML and adds insurance-specific wrappers, confounding diagnostics, and documentation for pricing use cases.
DoWhy	PyWhy’s causal reasoning library. DAG-based identification, refutation tests, and causal discovery. Useful for formalising causal assumptions before fitting a DML model.
CausalML	Uber’s uplift modelling library — T-learner, S-learner, X-learner, causal forests. Primarily built for marketing/conversion uplift; works for price response estimation with adaptation.

Model Monitoring & Drift Detection

Tools for detecting when a deployed pricing model has gone stale.

Insurance-specific note: generic drift tools do not implement exposure-weighted PSI, actual-vs-expected ratios, or Gini drift tests. They are useful for feature drift detection but insufficient for full pricing model monitoring.

Tool	Description
insurance-monitoring (Burning Cost)	Exposure-weighted PSI/CSI, A/E ratios with Garwood CIs, Gini drift z-test, and PITMonitor for calibration drift via e-process martingale. mSPRT sequential testing holds 1% FPR where peeking t-tests reach 25%. InterpretableDriftDetector attributes drift to feature interactions with BH FDR control. v0.8.0.
Evidently	Apache 2.0. 100+ metrics for data quality, feature drift (PSI, KS, Wasserstein), and model performance. Dashboard UI, MLflow integration. v0.7.20 as of January 2026. Pivoting toward LLM observability — insurance teams are not the primary audience. Useful for feature distribution monitoring.
NannyML	Apache 2.0. Confidence-Based Performance Estimation (CBPE) estimates model performance without ground-truth labels. Univariate and multivariate drift. v0.13.1, July 2025. CBPE applies to calibrated classifiers — not directly to Poisson/Gamma regression. Useful for binary conversion model monitoring.

Conformal Prediction

Distribution-free prediction intervals with finite-sample coverage guarantees.

Tool	Description
insurance-conformal (Burning Cost)	Five non-conformity scores tuned for Tweedie and Poisson claims. The `pearson_weighted` score gives 13.4% narrower intervals than parametric Tweedie at identical 90% coverage. Frequency-severity conformal intervals, online retrospective adjustment (RetroAdj), Solvency II SCR bounds. v0.6.0.
insurance-conformal-ts (Burning Cost)	Conformal prediction for non-exchangeable claims time series — ACI, EnbPI, SPCI, MSCP, Poisson/NB non-conformity scores.
MAPIE	The standard Python conformal prediction library. v1.3.0. Works with any sklearn-compatible estimator. The natural starting point — insurance-conformal adds the Tweedie/Poisson-specific non-conformity measures and insurance regulatory output that MAPIE does not include.
crepes	Conformal regressors and predictive systems. Simpler API than MAPIE, useful for normalised conformal prediction and Mondrian (conditional coverage) approaches.

Model Governance & Validation

Tools for structured model validation and regulatory documentation.

Tool	Description
insurance-governance (Burning Cost)	PRA SS1/23-compliant model validation reports. Bootstrap Gini CI, Poisson A/E CI, double-lift charts, renewal cohort test. HTML/JSON output structured for model risk committees. Catches miscalibration that manual checklists miss.

No comparable open-source tools exist for insurance-specific model governance as of March 2026. Generic ML model cards (e.g., Google’s Model Card Toolkit) do not cover actuarial validation tests.

Credibility & Experience Rating

Classical actuarial credibility methods in Python.

Tool	Description
insurance-credibility (Burning Cost)	Bühlmann-Straub credibility with mixed-model equivalence checks, Bayesian experience rating, and individual experience rating (static, dynamic, surrogate, and deep attention variants). 6.8% MAE improvement on thin schemes in benchmarks.

No significant external open-source Python credibility libraries exist. chainladder-python (CAS) covers claims reserving but not rating credibility.

Portfolio Optimisation

Constrained rate optimisation subject to profitability, retention, and regulatory constraints.

Tool	Description
insurance-optimise (Burning Cost)	SLSQP constrained rate optimisation with analytical Jacobians, FCA ENBP constraints, efficient frontier, and ParetoFrontier for multi-objective optimisation across profit, retention, and fairness. Includes demand modelling (conversion/retention elasticity). Benchmarked at +143.8% profit lift over flat rate loading.

No open-source Python insurance rate optimisation library existed before insurance-optimise. Financial portfolio optimisers (PyPortfolioOpt, skfolio) use Markowitz/HRP methods that do not transfer to insurance pricing constraints.

Cross-Validation for Insurance

Tool	Description
insurance-cv (Burning Cost)	Temporal walk-forward cross-validation respecting policy time structure, IBNR buffers, and sklearn-compatible scorers. Walk-forward detects 10.5% optimism that k-fold hides on insurance data.

Standard scikit-learn TimeSeriesSplit does not handle IBNR buffers or exposure-weighted insurance scoring. insurance-cv adds these on top of a sklearn-compatible API.

Model Deployment & Champion/Challenger

Tool	Description
insurance-deploy (Burning Cost)	Champion/challenger routing with shadow mode, SHA-256 deterministic assignment, SQLite quote log, bootstrap LR test, and ENBP audit trail.

MLflow handles experiment tracking and model registry but does not implement champion/challenger routing with insurance-specific KPI tracking. BentoML and Seldon handle serving but not A/B routing with quote-level logging.

Datasets

Public datasets for benchmarking insurance pricing models.

Tool / Dataset	Description
insurance-datasets (Burning Cost)	Synthetic UK motor and home portfolios with known data-generating process parameters. Use this to verify that your model recovers true relativities before applying it to real data. Polars output supported.
freMTPL2freq / freMTPL2sev via OpenML	French motor third-party liability dataset. The standard public benchmark for insurance GLMs and GBMs. ~678k policies, claim frequency and severity. Available via `sklearn.datasets.fetch_openml` or directly from OpenML.
CASdatasets	R package containing 40+ actuarial datasets (motor, property, health). Python users must read the CSV exports. Maintained by Charpentier (UQAM). No Python package.

General ML Tools Commonly Used in Pricing

These are not insurance-specific, but any pricing team working in Python will use them.

Tool	Description
CatBoost	Yandex’s gradient boosting library. The GBM of choice for insurance pricing — native categorical variable handling without one-hot encoding, built-in Tweedie/Poisson/Gamma objectives, fast CPU training. Consistently outperforms LightGBM on high-cardinality categorical data common in motor pricing.
LightGBM	Microsoft’s GBM library. Faster training than CatBoost on large datasets with few categoricals. The standard alternative when CatBoost is slow.
glum	Listed above under Rating Factor Analysis — worth repeating here. If you are fitting GLMs in Python, use glum, not statsmodels.
Polars	Fast DataFrame library in Rust. Handles the 10M+ row portfolios where pandas is slow. Several Burning Cost libraries support `polars=True` output. v1.x API is stable.
scikit-learn	Pipeline infrastructure, preprocessing, model selection, and scoring. The scaffolding most pricing libraries are built on.
PyMC	Bayesian modelling in Python. Used in insurance-spatial (BYM2 territory models), bayesian-pricing, and insurance-credibility for hierarchical models. v5.x.

This list is maintained by Burning Cost. Corrections and additions welcome — open an issue on any of our repositories.