Part 11: Extracting territory relativities
Part 11: Extracting territory relativities¶
Once convergence is confirmed and post-fit diagnostics look clean, we extract the territory relativities. These are the multiplicative factors -- one per area -- that go into the rating engine.
rels = result.territory_relativities(credibility_interval=0.95)
print(rels.head(10))
print()
print(f"Columns: {rels.columns}")
print(f"Relativity range: [{rels['relativity'].min():.4f}, {rels['relativity'].max():.4f}]")
print(f"Areas with relativity > 1.10: {(rels['relativity'] > 1.10).sum()}")
print(f"Areas with relativity < 0.90: {(rels['relativity'] < 0.90).sum()}")
The output DataFrame has these columns:
| Column | Meaning |
|---|---|
area |
Area identifier (matches adj.areas) |
b_mean |
Posterior mean of b_i (log scale) |
b_sd |
Posterior SD of b_i (log scale) |
relativity |
exp(b_i - grand_mean_b); multiplicative factor |
lower |
Lower 95% credibility bound on relativity |
upper |
Upper 95% credibility bound on relativity |
ln_offset |
log(relativity) = b_mean - grand_mean_b; ready to use as GLM offset |
Normalisation: By default, relativities are normalised to the geometric mean. That is, exp(mean(log(rel_i))) = 1. No area is the explicit reference -- the factors multiply to 1 across the portfolio. This is the natural normalisation for a territory factor that exists alongside other factors in a multiplicative tariff.
If you prefer a specific reference area (e.g., the area containing your HQ, or the historically used baseline territory):
# Normalise to a specific area
rels_ref = result.territory_relativities(base_area="r5c5")
print(rels_ref.filter(pl.col("area") == "r5c5"))
# Should show relativity = 1.0 for the reference area
Sorting and inspection¶
# Highest-risk areas
print("Top 10 highest risk:")
print(rels.sort("relativity", descending=True).head(10))
print()
print("Top 10 lowest risk:")
print(rels.sort("relativity").head(10))
# Credibility interval width as a measure of uncertainty
rels = rels.with_columns(
(pl.col("upper") - pl.col("lower")).alias("ci_width")
)
print()
print("Widest credibility intervals (most uncertain):")
print(rels.sort("ci_width", descending=True).head(10))
Notice that the widest credibility intervals correspond to areas with sparse exposure. This is the credibility principle made explicit: areas with more data get tighter estimates.