Your loss model was trained on your policyholders. That sounds obvious — of course it was. But your policyholders are not a random sample of risks in the market. They are the risks that saw your price, compared it to competitors, decided it was acceptable, and weren’t declined by underwriting. Every one of those steps is a filter. The model you trained has absorbed those filters into its parameters, and if you now apply that model to quote-stage decisions — renewal pricing, PCW competitiveness, new product design — you are projecting into a population you never measured.
This is sample selection bias. Heckman (1979) formulated it. It has been studied extensively in labour economics, credit scoring, and healthcare. In insurance technical pricing, it remains almost entirely uncorrected.
The selection mechanism in three steps
Step 1: Quote to accept. On a price comparison website, roughly two-thirds of UK motor new business now transacts (FCA EP25/2, July 2025). A customer visits, sees multiple quotes, buys from the cheapest insurer that meets their requirements. You win the customers where your risk model made a favourable error — you priced below true expected cost for that risk, which made you competitive. You lose the customers where your model made an unfavourable error — you priced above market for them, so someone else won them. This is the insurance version of the winner’s curse from auction theory: your bound portfolio is a skewed sample of your quotes, skewed toward the risks where your model was too optimistic.
| Step 2: The outcome is only observed for the selected. Claims data is only generated by policyholders. If a customer did not bind with you, you never observe their claims. You train a frequency model on bound policies only. The model learns E[claims | risk factors, policy bound]. Not E[claims | risk factors]. The second quantity is what you need for pricing. The first is what you have. |
| Step 3: The selection depends on the outcome. This is the part that makes the bias systematic rather than merely noisy. Selection (did they bind?) depends on price, which depends on the risk model, which depends on characteristics that also predict claims. The selection indicator S is correlated with the claims outcome Y, even after conditioning on observed risk factors X. Formally, in Heckman’s notation, the error terms in the structural equation (claims) and the selection equation (bind or not) are correlated — ρ ≠ 0. Ordinary regression on the selected sample produces estimates of E[Y | X, S=1], which differs from E[Y | X] by a term proportional to ρ times the inverse Mills ratio evaluated at the selection probability. |
This is not a small-data problem. It affects every insurer with market-facing pricing. More data from the same selected book makes the estimates more precise — but precisely wrong.
Why it matters for pricing
The practical consequences run through three channels.
Biased frequency and severity estimates by segment. For segments where you are consistently cheap — say, young urban drivers where your model has a structural underestimate — you attract disproportionate volume. But within that segment, your bound policies are a biased draw: the younger drivers who found you cheapest despite your already-low price are the ones who would have struggled to beat your price elsewhere. Their underlying risk may be different from the segment average in ways you cannot observe. Côté, Côté & Charpentier (SSRN 5018749, November 2024) call this representation bias combined with selection bias: the portfolio is both unrepresentative in composition and internally selected within the represented segments. Models calibrated to this book do not recover market-wide loss ratios, even after observed risk adjustment.
For segments where you are consistently expensive — you have thin data. The few customers who did bind with you in those segments are the ones who found you cheapest despite your high price. They are adverse-selected within the segment. Your model, trained on these customers, will underestimate severity for the segment as a whole if the cheapest-at-your-price customers are systematically lower severity than the segment average.
Biased demand elasticity estimates. If you use your claims data to calibrate a technical premium, and then use that technical premium in a renewal pricing model, the selection bias in the technical model propagates into the elasticity model. At renewal, selection operates again: high-premium customers lapse. Your renewal-year claims data is drawn from the subset of customers who renewed — a sample conditioned on not lapsing. Conditioning on renewal opens a collider path: both your renewal price and the customer’s risk drive the renewal decision, so conditioning on renewal creates a spurious correlation between price and risk in the surviving sample.
The winner’s curse interaction. Woodard & Yi (2020, Journal of Risk & Insurance, 87(2)) documented in US crop insurance that the insurer winning a competitive bid has typically made the most optimistic error in their risk assessment — because winning is precisely the event that signals your estimate was below market. There is no peer-reviewed empirical estimate of this effect’s magnitude in UK personal lines. The Actuaries Digital community has discussed the mechanism qualitatively for large commercial risks. The principle applies in PCW personal lines: your model’s errors determine who you attract, and your book then reflects those errors back to you as apparent performance. Naively, you interpret the observed claims as validation of the model. It is not.
The credit scoring analogue gives us an order of magnitude. Jacobson & Roszbach (2003, Journal of Banking & Finance) and Feelders (2000) found that Heckman corrections on loan approval data shifted parameter estimates by 20–50% when selection was moderate (ρ ~ 0.3–0.5). There is no comparable published UK motor study. We are being explicit about that gap: the bias is theoretically certain and directionally predictable, but its magnitude in UK personal lines data is an open empirical question.
Correction methods
All four methods below require the same fundamental data asset: quote-level data with a bind indicator. You need to observe the selection process — who quoted, who bought. If you are training your loss model only on policies in force, you cannot apply any of these corrections. The data you need almost certainly exists: every UK motor insurer logs quote transactions. The gap is assembling quote logs with bind outcomes and joining to claims, not generating new data.
Heckman two-stage
| Heckman’s (1979) solution is to model the selection equation directly. Stage 1: fit a probit model for P(bind | Z), where Z includes all risk factors X plus at least one exclusion restriction — a variable that predicts binding but is theoretically excluded from the claims model. Stage 2: compute the inverse Mills ratio λ = φ(Ẑ’γ̂) / Φ(Ẑ’γ̂) from the probit fitted values, and include it as an additional covariate in the claims regression. The coefficient on λ estimates ρ × σ — the product of the error correlation and the outcome standard deviation. |
The exclusion restriction is what makes Heckman work — and what makes it hard. In UK motor:
- PCW rank at time of quote: rank 1 vs rank 5 dramatically affects conversion but should be uncorrelated with expected claims conditional on risk factors. The complication: rank is partly determined by your own technical premium, so it is partially endogenous. Use residual rank after controlling for technical premium as the instrument.
- Number of competing quotes in the session: more options reduce conversion probability but do not directly affect claim risk.
- Rate review timing: commercial loading changes applied at quarterly reviews shift prices without changing the risk model. This is arguably the cleanest instrument available: a pure pricing decision imposed uniformly creates exogenous price variation.
Failure modes: if the exclusion restriction is weak (first-stage F-statistic below 10), the inverse Mills ratio is nearly collinear with X and the estimates become unstable. Heckman also assumes bivariate normality of errors, which is violated for count outcomes. For Poisson frequency models or Gamma severity models, use the control function approach instead (below).
Note: statsmodels 0.14 does not include Heckman in the main package (GitHub issue #1921, open since 2014). Implement manually via statsmodels probit for stage 1 and OLS for stage 2.
Inverse probability weighting
| Instead of modelling the outcome corrected for selection, IPW reweights the observed data to look like the full quote population. Estimate the propensity score π̂ᵢ = P(bind | Xᵢ, Dᵢ) using any classifier (gradient boosted trees work well). Weight each bound observation by 1/π̂ᵢ when fitting the claims model. A bound policy from a low-conversion segment counts for more than a bound policy from a high-conversion segment, reflecting the fact that few policies from that segment were observed. |
| The reweighted loss model estimates E[Y | X] for the quote population rather than E[Y | X, S=1] for the bound population. For UK motor PCW data, the selection model is P(bind | X, own price, PCW rank, competitor presence), which you can estimate from quote log data. |
IPW’s weakness is instability when π̂ is near zero — large weights produce high variance. Clip weights at the 5th–95th percentile (stabilised weights w = π̂/mean(π̂) further reduce variance). The method works best when the selection rate is not too extreme: 20–80% conversion is tractable; 2% conversion creates severe weight instability.
Control function approach
For non-linear outcome models — Poisson frequency, Gamma severity — the Heckman approach with its normality assumption is inconsistent. The control function (or 2SRI: two-stage residual inclusion) approach extends to these models correctly (Wooldridge 2015, Journal of Human Resources, 50(2); Terza 2008, Journal of Health Economics, 27).
Stage 1: fit a probit or logit model for selection on Z (including the exclusion restriction). Compute the residual r̂ᵢ = Sᵢ − π̂(Zᵢ). Stage 2: include r̂ᵢ as an additional covariate in the Poisson or Gamma regression on the selected sample.
The coefficient on r̂ᵢ tests for selection bias directly: a t-test on that coefficient is a valid specification test for whether selection is present. If the coefficient is not significantly different from zero, the naive model without selection correction is consistent. If it is significant, the model augmented with r̂ᵢ is the consistent estimate.
This is the most practically useful diagnostic for a pricing team that is unsure whether selection bias is material: fit the augmented model, test the residual coefficient, and the data tells you whether correction matters for your book.
Doubly robust / AIPW
| The augmented inverse probability weighted (AIPW) estimator combines an outcome model g(X) = E[Y | X, S=1] and a selection model π̂, but only requires one of the two to be correctly specified for consistency. The estimator for the population mean outcome is: |
θ_DR = (1/n) Σ [ g(Xᵢ) + (Sᵢ/π̂ᵢ) × (Yᵢ − g(Xᵢ)) ]
| When S is driven entirely by observable X (selection on observables), this is semiparametrically efficient. In practice for insurance: train g on bound policies using your standard Poisson GLM or GBM, train π̂ on all quotes using logistic/GBM for P(bind | X), and use the DR estimator for calibration of the mean predicted loss. |
For production use, AIPW is the best default: more robust than Heckman, more efficient than IPW alone. Microsoft EconML implements DRLearner for treatment effect estimation using the same principle. The insurance-causal library implements the selection-corrected version specifically for insurance loss models.
Python implementation
The insurance-causal library has two production-ready selection correction estimators. These are the only Python implementations we are aware of that handle insurance pricing selection problems specifically.
SelectionCorrectedElasticity corrects for renewal selection bias in price elasticity estimation. It uses an efficient influence function approach with IPW correction, including Manski-style sensitivity bounds for unobserved selection:
from insurance_causal.autodml.selection import SelectionCorrectedElasticity
# S=1 for renewers, S=0 for lapsers; Y=0 (not NaN) for non-renewers
est = SelectionCorrectedElasticity(outcome_family='gaussian', n_folds=5)
est.fit(X, D, Y, S)
result = est.estimate()
print(result) # AME with confidence interval, corrected for selection
# Sensitivity analysis: how much does unobserved selection confounding matter?
bounds = est.sensitivity_bounds(gamma_grid=[1.0, 1.5, 2.0, 3.0])
# Gamma=2: unobserved selection odds can at most double or halve
# Bounds widen monotonically; tells you how credible the correction is
The pi_hat (selection propensity) is fitted by GradientBoostingClassifier internally and clipped at [0.05, 0.95] for stability. Non-renewers must have Y=0 before calling fit() — not NaN.
DualSelectionDML handles a harder problem: claim severity, which is only observed for policies that both renewed and submitted a claim. That is two sequential selection steps. Standard selection correction handles one binary selection variable; DualSelectionDML handles multivariate ordinal selection using the control function approach from Dolgikh & Potanin (arXiv:2511.12640, 2025):
from insurance_causal.autodml import DualSelectionDML
import numpy as np
# Z: selection indicators stacked — e.g. [renewal_indicator, claim_indicator]
Z = np.column_stack([renewal_indicator, claim_indicator])
est = DualSelectionDML(estimand='ATE', n_folds=5, nuisance_backend='catboost')
est.fit(Y, D, Z, X, W_Z=exclusion_restriction_matrix)
result = est.estimate()
sensitivity_df = est.sensitivity(rho_range=(-0.5, 0.5))
When W_Z (exclusion restrictions) is not provided, the library issues a UserWarning and falls back to functional form identification — less credible, but available for data situations where no clean instrument exists.
Cross-fitting in DualSelectionDML uses double sample splitting within each complement fold: one split fits the conditional CDF models for the control functions, the other fits the outcome and propensity models conditional on those control functions. This avoids Neyman orthogonality violations from using the same data for both stages.
Data requirements
None of these methods work without quote-level data with bind indicators. For renewal selection:
- All renewal invitation records, not just reneweds
- Quoted premium, customer and risk characteristics, renewal indicator
- Where renewal = 1: subsequent claims
For new business PCW selection:
- All PCW quote records from your quote engine logs
- Own price, bind indicator, customer characteristics
- Where bind = 1: claims
Most UK motor insurers log every quote generated. The bind indicator is in the quote log. The gap is typically retention: quote logs are often purged after 6–12 months. For selection correction to work on 3–5 year loss development windows, you need quote logs retained for the same period. Start retaining them now if you are not.
| The ideal additional data: competitor prices at quote time from PCW data feeds. This enables a much better selection model — P(bind | X, own rank, competitor prices) rather than P(bind | X, own price alone). UK PCW data feeds include some competitor price information (typically rank positions); the commercial data is substantially underused for this purpose. |
Honest limitations
No published UK empirical bias magnitudes. There is no peer-reviewed paper that quantifies the magnitude of selection bias on loss model parameters for UK personal lines. Côté et al. (2024) establish the existence and direction theoretically and via simulation. The credit scoring analogue suggests parameter estimates differ by 20–50% under moderate selection. For UK motor specifically, the bias magnitude is unknown. It depends on your market position, pricing model accuracy, and PCW share — all of which vary. The theoretical case for correction is solid; the empirical case that it materially changes your numbers requires testing on your own data.
The exclusion restriction is hard. Heckman and control function approaches require a variable that predicts selection but not outcomes. In insurance, genuinely clean instruments are rare. Rate review timing is the most defensible candidate; it still requires assuming that commercial loading changes were not correlated with anticipated claims cost changes, which is not always true. When no credible exclusion restriction is available, IPW and AIPW under selection-on-observables assumptions are more credible than Heckman with a weak or invalid instrument.
Data requirements are steep. If you have not retained quote logs with bind outcomes, you cannot apply these methods. The correction cannot be bootstrapped from bound-policy data alone — you need the denominator of the selection process. This is a data infrastructure problem, not a modelling problem, and it requires a decision to fix before you have the data to use.
Selection on observables vs unobservables. IPW and AIPW are consistent only when selection is fully explained by observed covariates — there are no unobserved drivers of both selection and outcomes. For renewal pricing, this means no unobserved customer characteristics that simultaneously predict lapse and claims. This is unlikely to hold exactly. The sensitivity bounds in SelectionCorrectedElasticity (via sensitivity_bounds()) quantify how much the estimate changes if unobserved selection confounding exists. Run them and report the bounds, not just the point estimate.
Where this fits in the stack
Selection bias in the training data is upstream of every other model quality problem. A better GBM architecture, more features, better hyperparameter tuning — all of these make the selection-biased model more precisely wrong. Fixing the selection problem first makes every downstream improvement meaningful.
The practical priority order: first, check whether you have quote-level data with bind indicators and start retaining it if not. Second, run the control function test on your renewal data — include the residual from a selection probit in your frequency model and test its significance. If the coefficient is not significant, your current model may not require correction for the renewal book. Third, if selection is present, implement IPW or AIPW using SelectionCorrectedElasticity for the renewal pricing use case. Fourth, if you have severity selection (claims only observed for claimants), use DualSelectionDML.
This is not a theoretical refinement. If your book’s selection is moderate — plausibly the case for any PCW-heavy motor insurer — the price relativities in your current model may be systematically biased in the same direction that made you competitive for those risks in the first place. That circularity is the mechanism. The correction breaks it.
References
- Heckman, J.J. (1979). Sample Selection Bias as a Specification Error. Econometrica, 47(1), 153–161.
- Côté, O., Côté, M.-P. & Charpentier, A. (2024). Selection Bias in Insurance: Why Portfolio-Specific Fairness Fails to Extend Market-Wide. SSRN 5018749.
- Chiappori, P.-A. & Salaniè, B. (2000). Testing for Asymmetric Information in Insurance Markets. Journal of Political Economy, 108(1), 56–78.
- Wooldridge, J.M. (2015). Control Function Methods in Applied Econometrics. Journal of Human Resources, 50(2), 420–445.
- Terza, J.V. (2008). Two-Stage Residual Inclusion Estimation: Addressing Endogeneity in Health Econometric Modeling. Journal of Health Economics, 27, 531–543.
- Rosenbaum, P.R. & Rubin, D.B. (1983). The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika, 70(1), 41–55.
- Jacobson, T. & Roszbach, K. (2003). Bank Lending Policy, Credit Scoring and Value at Risk. Journal of Banking & Finance, 27, 615–633.
- Dolgikh, A. & Potanin, M. (2025). Causal Effect Estimation Under Multivariate Ordinal Selection. arXiv:2511.12640.
- FCA (2025). Evaluation Paper 25/2: An evaluation of our General Insurance Pricing Practices (GIPP) remedies. Financial Conduct Authority, July 2025.
- Burning Cost insurance-causal: SelectionCorrectedElasticity and DualSelectionDML for selection bias correction in insurance pricing.