Part 16: Comparing BYM2 to Emblem postcode groups
Part 16: Comparing BYM2 to Emblem postcode groups¶
This is the practical payoff. We quantify the difference between BYM2 and k-means banding.
Simulating Emblem banding on the synthetic data¶
from sklearn.cluster import KMeans
# k-means on log O/E ratios -- the Emblem approach
K = 8 # number of territory bands
log_oe_for_km = log_oe.reshape(-1, 1)
km = KMeans(n_clusters=K, random_state=42, n_init=10)
km.fit(log_oe_for_km)
band_assignments = km.labels_
# Compute band relativities: mean O/E within each band
band_oe = np.zeros(K)
for k in range(K):
mask = band_assignments == k
band_oe[k] = (claims[mask].sum() / exposure[mask].sum()) / portfolio_freq
# Assign band relativity to each area
naive_band_rel = band_oe[band_assignments]
print(f"Number of bands: {K}")
print(f"Band relativities: {np.sort(band_oe)}")
print()
# Compare range of relativities
print(f"BYM2 relativity range: [{rels['relativity'].min():.4f}, {rels['relativity'].max():.4f}]")
print(f"Band relativity range: [{naive_band_rel.min():.4f}, {naive_band_rel.max():.4f}]")
Visualising the comparison¶
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# BYM2
im0 = axes[0].imshow(rel_grid, cmap="RdYlGn_r", vmin=0.7, vmax=1.5, origin="upper")
axes[0].set_title("BYM2 (smoothed, sector-level)")
plt.colorbar(im0, ax=axes[0])
# k-means banding
band_grid = naive_band_rel.reshape(NROWS, NCOLS)
im1 = axes[1].imshow(band_grid, cmap="RdYlGn_r", vmin=0.7, vmax=1.5, origin="upper")
axes[1].set_title(f"k-means banding (k={K})")
plt.colorbar(im1, ax=axes[1])
plt.tight_layout()
plt.show()
The banded map has sharp discontinuities at band boundaries. Adjacent cells in different bands show discrete jumps. The BYM2 map is spatially smooth: risk changes gradually. Both map approximate the true underlying pattern, but BYM2 does it without introducing artificial discontinuities.
Which is more accurate?¶
bym2_mae = np.abs(rel_values - true_rel_grid.ravel()).mean()
band_mae = np.abs(naive_band_rel - true_rel_grid.ravel()).mean()
print(f"BYM2 MAE vs. truth: {bym2_mae:.4f}")
print(f"k-means banding MAE: {band_mae:.4f}")
# Correlation with true relativities
bym2_corr = np.corrcoef(rel_values, true_rel_grid.ravel())[0, 1]
band_corr = np.corrcoef(naive_band_rel, true_rel_grid.ravel())[0, 1]
print(f"BYM2 correlation: {bym2_corr:.4f}")
print(f"k-means correlation: {band_corr:.4f}")
BYM2 will be more accurate (lower MAE, higher correlation) in most runs. The advantage is most pronounced for sparse areas where k-means banding is dominated by noise.