Skip to content

Part 16: Logging to MLflow

Part 16: Logging to MLflow

Everything that went into production should be tracked. Log the relativities, validation results, and model state to MLflow.

In a new cell, type this and run it (Shift+Enter):

import mlflow
from datetime import date

mlflow.set_experiment("/Users/your-username/pricing-module-04")

with mlflow.start_run(run_name="gbm_shap_relativities_v1") as run:

    # Log model parameters
    mlflow.log_params(freq_params)

    # Log validation results as metrics
    for check_name, result in checks.items():
        mlflow.log_metric(f"validation_{check_name}_passed", int(result.passed))

    # Log reconstruction error specifically
    recon_result = checks["reconstruction"]
    if hasattr(recon_result, "value"):
        mlflow.log_metric("reconstruction_max_error", float(recon_result.value))

    # Log the relativities table as a CSV artefact
    rels_path = "/tmp/gbm_relativities.csv"
    rels.to_csv(rels_path, index=False)
    mlflow.log_artifact(rels_path, "relativities")

    # Log Radar export files
    mlflow.log_artifact("/dbfs/tmp/gbm_relativities_radar.csv", "radar_export")
    mlflow.log_artifact("/dbfs/tmp/gbm_age_band_relativities_radar.csv", "radar_export")

    # Log key relativities as metrics for easy comparison
    area_f_rel = rels[rels["feature"] == "area"].set_index("level").loc["F", "relativity"]
    ncd5_rel   = rels[rels["feature"] == "ncd_years"].set_index("level").loc[5, "relativity"]
    conv_rel   = rels[rels["feature"] == "has_convictions"].set_index("level").loc[1, "relativity"]

    mlflow.log_metric("area_F_relativity",        area_f_rel)
    mlflow.log_metric("ncd5_relativity",           ncd5_rel)
    mlflow.log_metric("conviction_relativity",     conv_rel)
    mlflow.log_metric("n_policies",               len(df))

    # Log the CatBoost model itself
    mlflow.catboost.log_model(freq_model, "catboost_model")

    print(f"Run ID: {run.info.run_id}")
    print(f"Area F relativity logged:    {area_f_rel:.4f}")
    print(f"NCD=5 relativity logged:     {ncd5_rel:.4f}")
    print(f"Conviction relativity logged:{conv_rel:.4f}")

You will see the run ID printed and the key metrics. Go to the MLflow UI (click the flask icon in the Databricks left sidebar, then Experiments) to verify the run was logged. You should see:

  • The model parameters (loss function, learning rate, etc.)
  • The validation pass/fail metrics
  • The key relativities as searchable metrics
  • The relativities CSV and Radar export files as downloadable artefacts

Why log relativities to MLflow and not just Unity Catalog? MLflow gives you the link between the specific model run (the exact training data, the exact hyperparameters) and the relativities derived from it. If someone later asks "which model version did the area F = 1.95 relativity come from?", you can find it in MLflow. Unity Catalog stores the relativity table for querying; MLflow stores the audit trail.