Part 7: Extracting categorical relativities
Part 7: Extracting categorical relativities¶
In a new cell, type this and run it (Shift+Enter):
TRUE_PARAMS = {
"area_B": 0.10, "area_C": 0.20, "area_D": 0.35,
"area_E": 0.50, "area_F": 0.70,
"ncd_years": -0.15,
"has_convictions": 0.45,
}
rels = sr.extract_relativities(
normalise_to="base_level",
base_levels={
"area": "A",
"has_convictions": 0,
},
)
print("Area relativities:")
print(rels[rels["feature"] == "area"].to_string(index=False))
Note on return type: extract_relativities() returns a pandas DataFrame, not a Polars DataFrame. This is because the underlying SHAP library (which shap-relativities wraps) works with pandas and numpy natively. Methods like .to_string(index=False), .set_index("level"), and .copy() used throughout this module are pandas methods - this is intentional and correct.
You will see:
Area relativities:
feature level relativity lower_ci upper_ci mean_shap shap_std n_obs exposure_weight
area A 1.000 1.000 1.000 -0.613 0.033 9985 7951.2
area B 1.108 1.063 1.155 -0.522 0.037 18042 14368.5
area C 1.225 1.183 1.269 -0.430 0.030 24998 19901.8
area D 1.431 1.381 1.483 -0.278 0.031 22015 17527.7
area E 1.668 1.607 1.731 -0.092 0.034 15048 11982.3
area F 1.950 1.869 2.034 0.110 0.037 10012 7968.0
The exact numbers will differ slightly from this. The important check is area F: the true DGP has area_F = 0.70, giving exp(0.70) = 2.014. The extracted relativity of 1.950 is close - the difference is partly sampling variation, partly the GBM's imperfect separation of area from other features.
Now look at NCD. In a new cell, type this and run it (Shift+Enter):
print("NCD relativities:")
ncd_rels = rels[rels["feature"] == "ncd_years"].sort_values("level")
print(ncd_rels[["level", "relativity", "lower_ci", "upper_ci", "n_obs"]].to_string(index=False))
# True DGP: exp(-0.15 * k) for k = 0..5
print("\nTrue DGP NCD relativities:")
for k in range(6):
print(f" NCD={k}: {np.exp(-0.15 * k):.3f}")
You will see NCD relativities decreasing from 1.000 at NCD=0 to around 0.47-0.50 at NCD=5. The true DGP gives exp(-0.15 × 5) = exp(-0.75) ≈ 0.472. If your NCD=5 relativity is between 0.42 and 0.53, the model is working correctly.
Now look at convictions. In a new cell, type this and run it (Shift+Enter):
print("Conviction relativities:")
conv_rels = rels[rels["feature"] == "has_convictions"]
print(conv_rels[["level", "relativity", "lower_ci", "upper_ci", "n_obs"]].to_string(index=False))
print(f"\nTrue DGP conviction relativity: exp(0.45) = {np.exp(0.45):.3f}")
You should see the conviction relativity (level=1) somewhere around 1.45-1.65. The true value is exp(0.45) ≈ 1.568. The interval should comfortably include 1.568.
What each column means¶
The output includes several columns beyond the relativity itself:
- mean_shap - the exposure-weighted mean SHAP value for this level. The relativity is
exp(mean_shap - mean_shap_base). - shap_std - exposure-weighted standard deviation of SHAP values within this level. Higher values mean more within-level variation - the GBM's predictions for this level are context-dependent.
- n_obs - number of observations at this level.
- exposure_weight - total exposure in years at this level.
- lower_ci / upper_ci - 95% confidence interval on the relativity.
Do not discard these columns when presenting to the pricing committee. The shap_std and n_obs are what you need to explain why one level has a wide CI and another has a narrow CI.