Skip to content

Part 9: Reading the NID scores

Part 9: Reading the NID scores

# Raw NID table (before GLM testing)
nid_table = detector.nid_table()
print("Top 10 NID candidates:")
print(nid_table.head(10))

The NID table contains:

Column Meaning
feature_1, feature_2 The candidate pair
nid_score Raw NID score (unnormalised)
nid_score_normalised Score rescaled to [0, 1] — 1.0 = highest-ranked pair
# Plot the NID scores
top_n = 15
top_nid = nid_table.head(top_n)
labels = [f"{r['feature_1']} × {r['feature_2']}" for r in top_nid.iter_rows(named=True)]

plt.figure(figsize=(10, 5))
plt.barh(range(top_n), top_nid["nid_score_normalised"].to_list(), color="#2271b3")
plt.yticks(range(top_n), labels)
plt.xlabel("NID score (normalised)")
plt.title("Top interaction candidates from NID")
plt.gca().invert_yaxis()
plt.tight_layout()
plt.show()

What to look for: The pair age_band × vehicle_group should rank first or second. The second planted interaction — ncd_years × conviction_points — should also appear in the top 5. If neither appears in the top 10, the CANN training may not have converged — try increasing cann_n_epochs to 500 or reducing the learning rate to 5e-4.

Understanding the NID scores in context

The NID scores are relative, not absolute. A pair with nid_score_normalised = 1.0 is the strongest interaction detected by the CANN. A pair with nid_score_normalised = 0.2 has one-fifth the detected signal. But "strongest detected" does not mean "statistically significant" — some pairs with high NID scores will fail the LR test because the CANN found a pattern that does not survive GLM testing.

The NID step is a fast screening tool. We then take the top 15 candidates to the slower, more rigorous LR test step.