shap-relativities

v0.2.3

Extract multiplicative rating relativities from GBM models using SHAP values. Built for insurance pricing.

What it does

GBMs outperform GLMs on most insurance pricing datasets, but the standard output — a single predicted value per row — is not directly useful for pricing teams. Underwriters and actuaries work with relativities: a table of multipliers, one per feature level, that explains how each risk characteristic adjusts the base rate.

shap-relativities converts a trained CatBoost model into exactly that format. The output is directly comparable to GLM exp(beta) relativities: a Polars DataFrame of (feature, level, relativity) triples where the base level is 1.0 and relativities multiply together to give the model's expected prediction.

Install

uv add shap-relativities
# or, for CatBoost + plotting support:
uv add "shap-relativities[all]"

Quick start

from shap_relativities import SHAPRelativities

sr = SHAPRelativities(
    model=catboost_model,
    X=df.select(["area", "ncd_years", "vehicle_age"]),
    exposure=df["exposure"],
    categorical_features=["area", "ncd_years"],
)
sr.fit()

rels = sr.extract_relativities(
    normalise_to="base_level",
    base_levels={"area": "London", "ncd_years": 0},
)
print(rels)

Or use the one-shot convenience wrapper:

from shap_relativities import extract_relativities

rels = extract_relativities(
    model, X,
    exposure=df["exposure"],
    categorical_features=["area"],
)

Public API

Name	Description
`SHAPRelativities`	Main class. Wraps a trained model and feature matrix, computes SHAP values, and extracts relativities with CIs.
`.fit()`	Compute SHAP values via TreeExplainer. Must be called before extracting relativities.
`.extract_relativities()`	Return a Polars DataFrame of (feature, level, relativity, lower_ci, upper_ci, ...).
`.extract_continuous_curve()`	Smoothed relativity curve for a continuous feature (LOESS or isotonic).
`.validate()`	Diagnostic checks: reconstruction error, feature coverage, sparse levels.
`.baseline()`	exp(expected_value) — the annualised base rate in prediction space.
`.to_dict()` / `.from_dict()`	Serialise and restore a fitted instance without the original model.
`extract_relativities()`	Convenience function for one-shot use. Calls fit() and extract_relativities() internally.
`datasets.load_motor()`	Synthetic UK motor portfolio dataset with known true parameters for validation.

Output columns

Column	Description
`feature`	Feature name
`level`	Feature level (category value or per-observation value for continuous features)
`relativity`	Multiplicative relativity. Base level = 1.0
`lower_ci`	Lower bound of 95% CLT confidence interval
`upper_ci`	Upper bound of 95% CLT confidence interval
`mean_shap`	Exposure-weighted mean SHAP value in log space
`shap_std`	Weighted standard deviation of SHAP values
`n_obs`	Observation count for this level
`exposure_weight`	Total exposure weight for this level