Part 2: Spatial autocorrelation -- the concept you are exploiting

Part 2: Spatial autocorrelation -- the concept you are exploiting¶

What it means¶

Spatial autocorrelation is the observation that nearby things tend to be more similar than distant things. For insurance, this is almost always true. Nearby postcode sectors tend to have similar:

Theft rates (crime tends to cluster geographically)
Road network quality and accident rates
Socioeconomic composition, which correlates with vehicle type, mileage, and maintenance
Weather exposure (flood risk, ice, urban microclimate)

This is not a problem to be corrected -- it is information to be exploited. If we know sector A has elevated claims frequency and sector B is adjacent to A, we should be less surprised to learn that sector B also has elevated frequency. The adjacency is informative.

The degree of spatial autocorrelation is measured by Moran's I. Before fitting a spatial model, we run Moran's I on the residuals from a non-spatial model (the log observed-to-expected ratio per sector). If Moran's I is significantly positive, spatial smoothing is warranted. If it is not significant, the data do not support spatial structure and we should stick with a simpler approach.

This is the key discipline in spatial modelling: test first, smooth second. Do not assume spatial structure. Test for it.

Moran's I in plain English¶

Moran's I is the spatial equivalent of an autocorrelation coefficient. It compares each area's value to the average of its neighbours' values. If areas with high values tend to be surrounded by other areas with high values (and low by low), the statistic is positive. If the arrangement is random, the statistic is close to its expected value of -1/(N-1), approximately zero for large N.

The formal statistic is:

I = (N / S0) * (z^T W z) / (z^T z)

where z is the vector of demeaned values (log O/E per sector), W is the row-standardised adjacency matrix (W[i,j] = 1/number_of_neighbours[i] when j is a neighbour of i), S0 is the sum of all weights, and N is the number of areas.

We compute p-values by permutation: randomly shuffling the z values across areas and recomputing I many times. The permutation p-value is the proportion of permuted statistics that are at least as extreme as the observed statistic. This avoids distributional assumptions and is reliable for the irregular geometries typical of UK postcode geography.