Part 12: Setting up alerts
Part 12: Setting up alerts¶
Why email notifications are not enough¶
The Databricks job notifications from Part 11 tell you the job ran and whether it succeeded or failed. They do not tell you whether the monitoring results showed anything concerning. A job can succeed (no errors) and produce a red traffic light (model has drifted significantly). You need a separate mechanism to alert on the content of the results, not just the execution status.
There are two ways to implement content-based alerting in Databricks: Databricks SQL Alerts and notebook-based notifications using the email API. We cover both.
Approach 1: Databricks SQL Alerts¶
Databricks SQL Alerts run a SQL query on a schedule and send a notification if the query returns results. They are the right tool when you want to alert based on data in a Delta table - which is exactly what we have after Part 10.
In your Databricks workspace, go to SQL in the left sidebar, then Alerts, then Create Alert.
Configure the alert query:
SELECT
run_date,
model_name,
ae_ratio,
overall_traffic_light,
psi_score
FROM main.motor_monitoring.monitoring_log
WHERE
overall_traffic_light IN ('AMBER', 'RED')
AND run_date >= DATE_SUB(CURRENT_DATE(), 7)
ORDER BY run_date DESC
Alert condition: Has rows (trigger the alert if the query returns any rows).
Schedule: run once per day (the alert will only trigger in the 7 days following a monitoring run, because that is the window in the WHERE clause).
Notification: add the pricing team email address.
This gives you a daily check on whether any recent monitoring run produced an amber or red result. If everything is green, the query returns no rows and no alert fires.
A more targeted alert: red flags only¶
Create a second alert that escalates immediately to the head of pricing for red results:
SELECT
run_date,
model_name,
ae_ratio,
ae_ci_lower,
ae_ci_upper,
gini_cur,
psi_score,
overall_traffic_light
FROM main.motor_monitoring.monitoring_log
WHERE
overall_traffic_light = 'RED'
AND run_date >= DATE_SUB(CURRENT_DATE(), 3)
Set this alert to run every hour (or every 30 minutes if you want faster escalation). A red flag should not sit unnoticed for a day.
Approach 2: Programmatic alerts from the notebook¶
If you want to send a richer notification with context - not just "the alert fired" but the actual metric values and a recommendation - you can send email directly from the monitoring notebook using a webhook or SMTP call.
For Databricks environments with a configured SMTP gateway:
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
def send_monitoring_alert(summary: dict, recipients: list[str]) -> None:
"""Send an email alert when the monitoring report is AMBER or RED."""
traffic_light = summary["overall_traffic_light"]
if traffic_light == "GREEN":
return # No alert needed
subject = f"[{traffic_light}] Motor model monitoring - {summary['current_date']}"
ae = summary["metrics"]["ae_ratio"]
gini = summary["metrics"]["gini"]
psi = summary["metrics"]["psi_score"]
body = f"""
Model monitoring report for {summary['model_name']}
Period: {summary['reference_date']} to {summary['current_date']}
Status: {traffic_light}
METRICS:
A/E ratio: {ae['value']:.4f} (95% CI: [{ae['ci_lower']:.4f}, {ae['ci_upper']:.4f}]) {ae['traffic_light']}
Score PSI: {psi['value']:.4f} {psi['traffic_light']}
Gini (ref): {gini['gini_ref']:.4f}
Gini (cur): {gini['gini_cur']:.4f} p={gini['p_value']:.4f} {gini['traffic_light']}
TOP CSI FLAGS:
"""
for item in sorted(summary["csi"], key=lambda x: x["csi"], reverse=True)[:5]:
body += f" {item['feature']:<25} CSI={item['csi']:.4f} {item['traffic_light']}\n"
body += f"""
This is an automated alert from the Burning Cost monitoring pipeline.
Notebook: module-11-model-monitoring
"""
msg = MIMEMultipart()
msg["Subject"] = subject
msg["From"] = "monitoring@yourcompany.com"
msg["To"] = ", ".join(recipients)
msg.attach(MIMEText(body, "plain"))
# Replace with your SMTP server details
# Store credentials in Databricks Secrets, not here
smtp_host = dbutils.secrets.get(scope="monitoring", key="smtp_host")
smtp_port = 587
smtp_user = dbutils.secrets.get(scope="monitoring", key="smtp_user")
smtp_pass = dbutils.secrets.get(scope="monitoring", key="smtp_pass")
with smtplib.SMTP(smtp_host, smtp_port) as server:
server.starttls()
server.login(smtp_user, smtp_pass)
server.sendmail(msg["From"], recipients, msg.as_string())
print(f"Alert sent to {recipients}")
# Call this after generating the report summary
ALERT_RECIPIENTS = [
"pricing@yourcompany.com",
"head.of.pricing@yourcompany.com",
]
send_monitoring_alert(summary, ALERT_RECIPIENTS)
Store smtp_host, smtp_user, and smtp_pass in Databricks Secrets (under the scope monitoring). Never put credentials in a notebook.
Approach 3: Microsoft Teams or Slack webhook¶
If your team uses Teams or Slack, a webhook notification is often more effective than email because it reaches people where they already work:
import requests
import json
def send_teams_alert(summary: dict, webhook_url: str) -> None:
"""Post a monitoring summary card to a Teams channel."""
traffic_light = summary["overall_traffic_light"]
if traffic_light == "GREEN":
return
colour = "FF0000" if traffic_light == "RED" else "FFA500"
ae = summary["metrics"]["ae_ratio"]
card = {
"@type": "MessageCard",
"@context": "https://schema.org/extensions",
"themeColor": colour,
"summary": f"Model monitoring {traffic_light}: {summary['model_name']}",
"sections": [{
"activityTitle": f"Model monitoring: {traffic_light}",
"activitySubtitle": (
f"{summary['model_name']} - period ending {summary['current_date']}"
),
"facts": [
{"name": "A/E ratio", "value": f"{ae['value']:.4f} ({ae['traffic_light']})"},
{"name": "Score PSI", "value": f"{summary['metrics']['psi_score']['value']:.4f}"},
{"name": "Gini (cur)", "value": f"{summary['metrics']['gini']['gini_cur']:.4f}"},
],
}],
}
response = requests.post(
webhook_url,
data=json.dumps(card),
headers={"Content-Type": "application/json"},
)
if response.status_code != 200:
print(f"Teams alert failed: {response.status_code} {response.text}")
else:
print("Teams alert sent.")
teams_webhook = dbutils.secrets.get(scope="monitoring", key="teams_webhook_url")
send_teams_alert(summary, teams_webhook)
Choosing your alert strategy¶
Use SQL Alerts as the primary mechanism. They are low-maintenance, require no code to update, and are visible to anyone with access to the Databricks SQL workspace. The programmatic email/Teams approach supplements this: use it in the monitoring notebook for richer context when the overall result is amber or red.
Do not rely solely on programmatic alerts from the notebook. If the notebook fails before reaching the alert code, no alert fires. SQL Alerts query the Delta table, so they fire correctly even if a notebook run has to be re-run from a checkpoint.