shap-relativities

v0.2.3

Extract multiplicative rating relativities from GBM models using SHAP values. Built for insurance pricing.

API Reference GitHub PyPI

What it does

GBMs outperform GLMs on most insurance pricing datasets, but the standard output — a single predicted value per row — is not directly useful for pricing teams. Underwriters and actuaries work with relativities: a table of multipliers, one per feature level, that explains how each risk characteristic adjusts the base rate.

shap-relativities converts a trained CatBoost model into exactly that format. The output is directly comparable to GLM exp(beta) relativities: a Polars DataFrame of (feature, level, relativity) triples where the base level is 1.0 and relativities multiply together to give the model's expected prediction.

Install

uv add shap-relativities
# or, for CatBoost + plotting support:
uv add "shap-relativities[all]"

Quick start

from shap_relativities import SHAPRelativities

sr = SHAPRelativities(
    model=catboost_model,
    X=df.select(["area", "ncd_years", "vehicle_age"]),
    exposure=df["exposure"],
    categorical_features=["area", "ncd_years"],
)
sr.fit()

rels = sr.extract_relativities(
    normalise_to="base_level",
    base_levels={"area": "London", "ncd_years": 0},
)
print(rels)

Or use the one-shot convenience wrapper:

from shap_relativities import extract_relativities

rels = extract_relativities(
    model, X,
    exposure=df["exposure"],
    categorical_features=["area"],
)

Public API

NameDescription
SHAPRelativities Main class. Wraps a trained model and feature matrix, computes SHAP values, and extracts relativities with CIs.
.fit() Compute SHAP values via TreeExplainer. Must be called before extracting relativities.
.extract_relativities() Return a Polars DataFrame of (feature, level, relativity, lower_ci, upper_ci, ...).
.extract_continuous_curve() Smoothed relativity curve for a continuous feature (LOESS or isotonic).
.validate() Diagnostic checks: reconstruction error, feature coverage, sparse levels.
.baseline() exp(expected_value) — the annualised base rate in prediction space.
.to_dict() / .from_dict() Serialise and restore a fitted instance without the original model.
extract_relativities() Convenience function for one-shot use. Calls fit() and extract_relativities() internally.
datasets.load_motor() Synthetic UK motor portfolio dataset with known true parameters for validation.

Output columns

ColumnDescription
featureFeature name
levelFeature level (category value or per-observation value for continuous features)
relativityMultiplicative relativity. Base level = 1.0
lower_ciLower bound of 95% CLT confidence interval
upper_ciUpper bound of 95% CLT confidence interval
mean_shapExposure-weighted mean SHAP value in log space
shap_stdWeighted standard deviation of SHAP values
n_obsObservation count for this level
exposure_weightTotal exposure weight for this level