Which library do I need?

Not every problem needs the same tool. This guide maps common pricing challenges to specific libraries, organised by where in the workflow you hit them. If you know what you are trying to do, find it here. If you are not sure where to start, try Getting Started first.

Quick reference

Find the library that matches the task. Click any library name to go to its GitHub repo.

I want to…	Library
extract GLM-style factors from a GBM	`shap-relativities`
run cross-validation without leaking future claims	`insurance-cv`
smooth noisy one-way curves	`insurance-whittaker`
get spatial territory factors	`insurance-spatial`
detect if my model has drifted	`insurance-monitoring`
prove my pricing is not discriminatory	`insurance-fairness`
generate a PRA SS1/23 validation report	`insurance-governance`
get prediction intervals on my GBM	`insurance-conformal`
model frequency and severity jointly	`insurance-frequency-severity`
price a segment with 200 policies	`insurance-thin-data`
deconfound a rating factor from channel correlation	`insurance-causal`
measure whether a rate change worked	`insurance-causal-policy`
set rates with movement caps and a loss ratio target	`insurance-optimise`
run a champion/challenger experiment	`insurance-deploy`
model tail risk beyond the mean	`insurance-quantile`
cluster vehicle groups or occupation codes into bands	`insurance-glm-tools`
model dispersion as well as the mean	`insurance-distributional-glm`
build a broker or fleet-adjusted model	`insurance-multilevel`
apply individual experience rating to a policy	`experience-rating`
generate synthetic training data	`insurance-synthetic`
score telematics trips as GLM-compatible risk	`insurance-telematics`
estimate price elasticity causally	`insurance-elasticity`
correct for a shift in my book mix	`insurance-covariate-shift`

Drop-in Works like sklearn. No domain configuration needed.

Needs setup Requires adjacency matrices, priors, or portfolio structure.

Advanced Expects familiarity with the statistical method.

Stage 1: Data preparation

Data preparation

Before any modelling: synthetic data for testing, representative training sets, and correcting for the fact that your historical book is not your target book.

insurance-datasets Drop-in

You need real-looking motor data to test a method but cannot share your book externally — or you want a benchmark with a known data-generating process to validate whether your model is even recovering the right signal.

Output: synthetic UK motor portfolio DataFrames with known frequency, severity, and interaction structure

insurance-synthetic Needs setup

Your model development team cannot touch production data, but the synthetic data you have does not preserve the multivariate dependence structure of the real book — correlations between vehicle age, driver age, and claim frequency are wrong.

Output: vine copula synthetic portfolio with exposure-aware Poisson marginals and a TSTR Gini fidelity report

insurance-covariate-shift Advanced

Your book mix has shifted — you wrote more young drivers last year through a new aggregator — and the model fitted on historical data is mispriced for today's portfolio because it was never trained on this mix.

Output: importance weights for reweighting training data, LR-QR conformal prediction intervals under shift, FCA SUP 15.3 diagnostic report

Stage 2: Core modelling

Core modelling

Building and validating the technical price — from temporally correct cross-validation through to distributional and thin-data methods.

shap-relativities Drop-in

Your GBM outperforms the production GLM on every holdout metric, but the actuarial team and your rating engine both need multiplicative factor tables — and exp(β) from CatBoost is not a thing.

Output: multiplicative factor tables per rating variable with confidence intervals, exposure weighting, and reconstruction R² validation

insurance-cv Drop-in

You used random k-fold CV and your held-out Poisson deviance looks fine, but the model was trained on claims that had not yet developed — IBNR from the last six months is in both your training and validation sets.

Output: walk-forward split indices with configurable IBNR buffer, sklearn-compatible scorers including Poisson and Gamma deviance

insurance-interactions Needs setup

Your main-effects GLM has unexplained residual structure — one-ways look fine but double-lifts between age and vehicle group show a gap the model cannot explain, and you suspect an interaction you have not captured.

Output: ranked list of significant interactions via CANN neural test, NID scores, and SHAP interaction decomposition

insurance-distributional Drop-in

Your point-estimate GBM tells you the expected loss cost per risk, but you have no way to distinguish a low-risk policy with stable claims from a high-risk policy that just happened to be quiet last year — you need per-risk volatility.

Output: full predictive distribution per risk (Tweedie, Gamma, ZIP, NegBin), per-risk volatility score, mean and variance separately

insurance-distributional-glm Advanced

Your overdispersion test is significant and the dispersion parameter phi varies by segment — fleet policies have much higher variance than personal lines — but a standard GLM assumes one scalar phi for the whole portfolio.

Output: GAMLSS model where mean, dispersion, and shape are all functions of covariates; factor tables for each distribution parameter

insurance-dispersion Needs setup

You want to model dispersion as a function of rating factors without the full complexity of GAMLSS — a simpler double GLM where the mean model and the dispersion model alternate until convergence.

Output: joint mean-dispersion GLM, overdispersion LRT, actuarial factor tables for both mean and dispersion submodels

insurance-gam Needs setup

You need interpretability that a GLM factor table gives you, but you also need the predictive power of a neural network — and you want exact Shapley values rather than approximate SHAP for sign-off.

Output: EBM tariff or actuarial NAM with shape functions per rating factor, exact Shapley values, pairwise interaction network

insurance-glm-tools Drop-in

Your vehicle make variable has 500 levels, your occupation code has 350 — the committee wants pricing bands, not a 500-row factor table, and you need a defensible method for collapsing them that respects the underlying risk signal.

Output: GLM-compatible pricing bands via R2VF algorithm, ridge ranking for nominals, fused lasso for ordered levels, BIC lambda selection

insurance-whittaker Drop-in

Your one-way driver age curve is spiky — the raw relativities jump between adjacent age bands because your data is thin in some cells — and you are eye-balling a smooth curve through it rather than doing this formally.

Output: smoothed relativities table with REML-selected smoothing parameter, Bayesian confidence intervals, 1D or 2D support

insurance-multilevel Needs setup

You have broker, scheme, or fleet as a grouping factor with hundreds of levels — some with thousands of policies, some with ten — and you need credibility-weighted group adjustments rather than either ignoring the grouping or overfitting to it.

Output: CatBoost GBM with REML random effects for group factors, Bühlmann-Straub credibility weights, ICC diagnostics per group

insurance-frequency-severity Advanced

Your frequency and severity models are fitted independently, but high-frequency risks also tend to have higher severity — you are ignoring a dependence structure that is inflating the premium in some segments and deflating it in others.

Output: joint Sarmanov copula model with GLM marginals, analytical premium correction for frequency-severity dependence, IFM estimation

insurance-severity Needs setup

Your severity model uses a single Gamma GLM across all loss sizes, but the body and tail of your loss distribution follow completely different physics — attritional claims behave like Gamma, large losses follow a Pareto tail, and one model cannot fit both.

Output: spliced severity model with covariate-dependent threshold, separate body and tail submodels, ILF tables, TVaR per risk

insurance-quantile Needs setup

You need to price for large loss loading or produce an increased limits factor table, but your mean model gives you no handle on the upper tail — you need quantile estimates at the 90th and 99th percentile per risk, not just the expected value.

Output: quantile GBM predictions at multiple levels per risk, TVaR, ILF tables, exceedance probability curves

insurance-thin-data Needs setup

A segment has 200 policies and your GLM estimates are unstable — the standard errors are wide, the coefficient on vehicle age has flipped sign, and a Bayesian approach requires you to specify priors you do not have strong views on.

Output: TabPFN or TabICLv2 predictions with GLM benchmark comparison, PDP relativities in standard factor table format, CommitteeReport

Stage 3: Specialised techniques

Specialised techniques

Territory pricing, telematics, causal inference, experience rating, and credibility — problems that need more than a standard GBM.

insurance-spatial Advanced

Your territory factors are eye-balled from a heat map or computed naively from raw one-ways — thin postcode cells get volatile relativities, adjacent postcodes have no relationship to each other, and the factors do not borrow strength from neighbouring areas.

Output: BYM2 spatial relativities per postcode with credibility-based smoothing from neighbours, Moran's I spatial autocorrelation diagnostics, PyMC 5 posteriors

bayesian-pricing Advanced

You have a specialist segment — classic car, high-net-worth, niche occupation — with enough data to price but not enough for a standard GLM to separate signal from noise, and you want to incorporate prior actuarial knowledge formally rather than ad hoc.

Output: PyMC 5 hierarchical Bayesian model with partial pooling, credibility factor output in standard actuarial format, posterior predictive distributions

insurance-causal Advanced

Your vehicle value factor looks significant in the GLM, but vehicle value correlates strongly with distribution channel — direct customers buy cheaper cars — and you cannot tell whether it is genuine risk signal or channel confounding.

Output: deconfounded coefficient estimates via double machine learning, confounding bias report, CatBoost nuisance models

insurance-causal-policy Advanced

You put through a rate increase in Q3 and conversion dropped — but you cannot tell how much of that drop was the rate change versus market conditions, because you have no control group and the pre/post comparison is confounded by seasonal effects.

Output: causal rate change impact estimate via synthetic difference-in-differences, event study chart, HonestDiD sensitivity, FCA evidence pack

insurance-telematics Advanced

You have raw 1Hz GPS and accelerometer data from a UBI product and need to turn it into a GLM-compatible risk score — but you have no principled method for aggregating trip-level events into driver-level risk that accounts for the mix of journey types.

Output: HMM-classified driving states per trip, Bühlmann-Straub aggregated driver risk score, Poisson GLM-compatible risk variable

insurance-dynamics Advanced

Claims frequency has been drifting upward for eighteen months and you need to know whether it is a sustained trend, a structural break, or noise — and you need the answer expressed as a trend index in development factor format for the reserving team.

Output: GAS score-driven trend index per rating cell, BOCPD/PELT changepoint detection, trend factor in development factor format

experience-rating Needs setup

Your NCD schedule is fixed at five years old and the discount scales were set judgementally — you want to fit actuarially correct credibility weights from your own data and incorporate dynamic claim history into an individual policy posterior.

Output: credibility-weighted NCD factors, Bühlmann-Straub static or dynamic state-space individual policy posterior, deep attention credibility model

insurance-survival Needs setup

You want to model lapse risk and customer lifetime value, but standard logistic regression ignores the time structure — policies renewing at different durations have different lapse hazards, and a cross-sectional model is missing that entirely.

Output: survival model with cure rate, CLV estimates per policy, lapse table by duration, MLflow-wrapped for production

insurance-elasticity Advanced

You need a price elasticity estimate for rate change planning, but conversion data is observational — the prices customers saw were set by a model, not randomly assigned — and a naive regression of conversion on price is picking up selection effects.

Output: causal price elasticity estimates per segment via CausalForestDML or DR-Learner, ENBP-constrained rate optimiser, regulatory audit trail

insurance-demand Needs setup

Your technical price model and your commercial rate are disconnected — the pricing committee sets rate change by segment without knowing the conversion and retention response, and FCA GIPP requires you to consider demand effects formally.

Output: demand model linking conversion and retention to price, price response curves per segment, FCA GIPP-compliant rate optimisation output

Stage 4: Fairness & compliance

Fairness & compliance

Proxy discrimination auditing, Consumer Duty documentation, and PRA model governance — what the FCA and PRA can ask for, answered before they ask.

insurance-fairness Needs setup

Your pricing actuary says postcode is a legitimate risk variable, but the compliance team is worried it correlates with ethnicity — and under FCA EP25/2 you need to quantify the indirect discrimination risk, not just assert there is none.

Output: FCA EP25/2-aligned proxy discrimination audit, disparate impact metrics, fairness-accuracy trade-off analysis, Consumer Duty evidence pack

insurance-governance Drop-in

Your model governance committee needs a validation report and you are producing it manually in PowerPoint — bootstrap Gini CI, Poisson A/E CI, double-lift charts, and a renewal cohort test, all formatted to what a model risk function expects.

Output: PRA SS1/23-compliant validation report in HTML and JSON, ModelCard, ModelInventory, GovernanceReport, risk tier scoring

Stage 5: Deployment & monitoring

Deployment & monitoring

Getting a model to production and keeping it honest once it is there.

insurance-deploy Needs setup

You want to run a new model in shadow mode before cutting over, but you have no infrastructure for routing quotes deterministically, logging both prices, and eventually running a statistically valid comparison to declare a winner.

Output: champion/challenger routing framework with SHA-256 deterministic routing, SQLite quote log, bootstrap likelihood ratio test, ICOBS 6B.2 audit trail

insurance-monitoring Drop-in

Your model has been live for nine months and A/E ratios are creeping up — but you do not know whether to recalibrate the intercept, refit with new data, or whether it is just IBNR noise in the most recent quarter.

Output: exposure-weighted PSI/CSI, segmented A/E ratios with IBNR adjustment, Gini z-test with a formal recalibrate-vs-refit decision rule

insurance-conformal Drop-in

Your GBM gives a point estimate per risk but Solvency II internal models need uncertainty bounds — and the standard bootstrap approach is slow and the distributional assumptions are wrong for insurance losses.

Output: distribution-free prediction intervals with finite-sample coverage guarantee, locally-weighted Pearson residual non-conformity scores, Solvency II SCR bounds

insurance-optimise Needs setup

You have a technical price per segment, a loss ratio target, and maximum movement caps — and the rate change recommendation is currently done in a spreadsheet where the constraints interact and the solution is not optimal.

Output: SLSQP-optimised rate changes per segment, efficient frontier between loss ratio improvement and movement constraints, JSON audit trail for FCA ENBP

Still not sure where to start?

Getting started paths Browse all libraries View on GitHub