Stochastic Reserving in Python: Mack and Bootstrap ODP with chainladder

Part 1 of this series showed how to build and fit a chain ladder model on the RAA triangle using the chainladder library, and how to extract the IBNR estimate of 52,135. That figure is a point estimate. It tells you the expected shortfall between current reserves and ultimate losses. It does not tell you how wrong it could be.

This post addresses that gap. We cover two methods for quantifying reserve uncertainty:

Mack — an analytical approach that derives standard errors directly from the development pattern, without simulation
Bootstrap ODP — a simulation method that generates a full empirical distribution of possible reserve outcomes

By the end you will have a percentile table covering the 50th through 99.5th percentiles of total IBNR, the numbers you need for a Solvency II SCR calculation or an IFRS 17 risk adjustment.

Why the point estimate is not enough

The chain ladder gives you an expected reserve. That is the right number for setting a best estimate liability. But capital, pricing, and accounting all require more.

Solvency II SCR. Under Article 101 of the Directive, the Solvency Capital Requirement is calibrated to a 99.5th percentile loss over a one-year horizon. For reserve risk, that means you need to know where the 99.5th percentile of the reserve distribution sits relative to your best estimate. The gap between the two is roughly your reserve SCR before diversification.

IFRS 17 risk adjustment. Under IFRS 17, the risk adjustment for non-financial risk represents the compensation an insurer requires for bearing uncertainty in claims cash flows. It must be disclosed with reference to a confidence level, which requires a quantified reserve distribution.

Internal management. Capital allocation, re-insurance purchasing, and business planning all depend on understanding how bad a reserve adverse development scenario could be — not just where the central estimate sits.

The Mack method and Bootstrap ODP are the two workhorses for answering these questions at the aggregate triangle level. Both are implemented in chainladder and both take about a dozen lines of Python.

Setup and the pandas compatibility fix

chainladder 0.9.1 (the current release) has a timestamp precision mismatch with pandas 3.x: internally it stores valuation dates at microsecond precision but pandas 3.0 uses nanoseconds by default, which causes the diagonal selector to return an empty slice and raise ValueError: Slice returns empty Triangle. The fix is a one-line patch applied before any triangle fitting.

import chainladder as cl
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats

# -----------------------------------------------------------------------
# Patch: fix microsecond/nanosecond mismatch in chainladder 0.9.1
# with pandas 3.x. Floor valuation timestamps to microsecond precision
# so the diagonal selector comparison works correctly.
# -----------------------------------------------------------------------
_orig_valuation = cl.Triangle.valuation.fget

def _patched_valuation(self):
    return _orig_valuation(self).floor("us")

cl.Triangle.valuation = property(_patched_valuation)

Apply this once at the top of your script. Every subsequent fit() call works normally.

# Load the RAA triangle — the same dataset used in Part 1
raa = cl.load_sample("raa")
print(raa)

         12       24       36       48       60       72       84       96       108      120
1981  5012.0   8269.0  10907.0  11805.0  13539.0  16181.0  18009.0  18608.0  18662.0  18834.0
1982   106.0   4285.0   5396.0  10666.0  13782.0  15599.0  15496.0  16169.0  16704.0      NaN
1983  3410.0   8992.0  13873.0  16141.0  18735.0  22214.0  22863.0  23466.0      NaN      NaN
...
1990  2063.0      NaN      NaN      NaN      NaN      NaN      NaN      NaN      NaN      NaN

Ten accident years (1981–1990), ten development periods (12–120 months). The 1981 accident year is fully developed. The 1990 accident year has only the 12-month diagonal.

Method 1: Mack chain ladder

The Mack method (Thomas Mack, 1993) is an analytical extension of the chain ladder. It derives standard errors for each origin year’s IBNR estimate without making distributional assumptions — treating the triangle as a distribution-free model and computing variance from the weighted residuals around each development factor.

The key decomposition is:

Parameter variance — uncertainty in the development factors themselves, arising because we estimate them from a finite number of years
Process variance — inherent randomness in future claims even if the development factors were known exactly

The sum of these gives the Mack standard error per origin year. Squared and summed across years (with a covariance correction), this gives the total reserve standard error.

mack = cl.MackChainladder().fit(raa)
print(mack.summary_)

       Latest          IBNR      Ultimate  Mack Std Err
18834.0           NaN  18834.000000           NaN
16704.0    153.953917  16857.953917    142.931716
23466.0    617.370924  24083.370924    592.148304
27067.0   1636.142163  28703.142163    712.853921
26180.0   2746.736343  28926.736343   1452.090330
15852.0   3649.103184  19501.103184   1994.987807
12314.0   5435.302590  17749.302590   2203.838469
13112.0  10907.192510  24019.192510   5354.340512
 5395.0  10649.984101  16044.984101   6331.543044
 2063.0  16339.442529  18402.442529  24565.775709

The IBNR figures match the chain ladder point estimates from Part 1 exactly — Mack gives identical central estimates, just adds standard errors alongside them. Notice that 1990 has the largest standard error (24,566) despite only having the second-largest IBNR (16,339). That is because it has only one diagonal of data; almost all its reserve is uncertain.

total_ibnr = float(np.nansum(mack.ibnr_.values))
total_se   = float(mack.total_mack_std_err_.values.flatten()[0])

print(f"Total IBNR:           {total_ibnr:>10,.0f}")
print(f"Total Mack Std Err:   {total_se:>10,.0f}")
print(f"Coefficient of Var:   {total_se / total_ibnr:>10.1%}")

Total IBNR:               52,135
Total Mack Std Err:       26,881
Coefficient of Var:         51.6%

A CV of 51.6% is high. On a ten-year triangle with thin early diagonals, this is realistic — the 1990 accident year alone has a standard error larger than its IBNR. On a mature, stable portfolio with 15+ accident years, you would typically see CVs of 10–25%.

Converting Mack standard errors to percentiles

Mack does not specify a distribution. The conventional actuarial practice is to fit a lognormal distribution matched to the mean and standard error. This is a common exam standard (CT4/ST7 syllabus), accepted in PRA submissions, and reasonable for unimodal, right-skewed reserve distributions.

# Lognormal parameterisation: match mean and variance to Mack outputs
cv2   = (total_se / total_ibnr) ** 2
sigma = np.sqrt(np.log(1 + cv2))          # lognormal shape
mu    = np.log(total_ibnr) - 0.5 * sigma**2  # lognormal location

mack_percentiles = {}
for p in [50, 75, 90, 95, 99.5]:
    mack_percentiles[p] = stats.lognorm.ppf(p / 100, s=sigma, scale=np.exp(mu))
    print(f"Mack {p:>5.1f}th: {mack_percentiles[p]:>10,.0f}")

Mack  50.0th:     46,338
Mack  75.0th:     64,293
Mack  90.0th:     86,332
Mack  95.0th:    102,986
Mack  99.5th:    161,840

The lognormal median (46,338) sits below the mean (52,135) — this is always true for a lognormal; the distribution is right-skewed. The 99.5th percentile of 161,840 is three times the central estimate, illustrating the capital requirement implied by reserve uncertainty on a small, immature triangle.

Method 2: Bootstrap ODP

The Over-Dispersed Poisson bootstrap (England and Verrall, 2002) is a simulation method. Rather than deriving standard errors analytically, it:

Fits the chain ladder to the original triangle and computes Pearson residuals
Samples from those residuals with replacement to construct 5,000 pseudo-triangles
Fits the chain ladder to each pseudo-triangle
Projects each pseudo-triangle to ultimate

The resulting 5,000 total IBNR values form an empirical distribution. No distributional assumption is required — the shape falls out of the data.

In chainladder, BootstrapODPSample is a development transformer (it produces simulated triangles), not a reserving method. The workflow is: bootstrap to get simulated triangles, then fit Chainladder to those triangles.

# Step 1: generate 5,000 bootstrapped triangles from the RAA data
boot = cl.BootstrapODPSample(n_sims=5000, random_state=42)
boot.fit(raa)
sims = boot.transform(raa)

print(f"Simulated triangles shape: {sims.shape}")
# (5000, 1, 10, 10) — 5,000 simulations, 1 column, 10 origins, 10 dev periods

Simulated triangles shape: (5000, 1, 10, 10)

# Step 2: fit chain ladder to all 5,000 simulated triangles simultaneously
cl_sims = cl.Chainladder().fit(sims)

# Step 3: extract total IBNR per simulation
# ibnr_ has shape (5000, 1, 10, 1) — sum across origin axis to get total per sim
total_ibnr_sims = cl_sims.ibnr_.sum(axis=2).values.flatten()
total_ibnr_sims = total_ibnr_sims[~np.isnan(total_ibnr_sims)]

print(f"Simulations retained: {len(total_ibnr_sims):,}")
print(f"Mean IBNR:   {np.mean(total_ibnr_sims):>10,.0f}")
print(f"Std dev:     {np.std(total_ibnr_sims):>10,.0f}")

Simulations retained: 5,000
Mean IBNR:     53,826
Std dev:       19,059

The mean of 53,826 sits close to the chain ladder point estimate (52,135). The slight upward bias is expected from the bootstrap resampling — the ODP bootstrap consistently overestimates slightly because the pseudo-triangles include process noise that the point estimate does not.

The percentile table

boot_percentiles = {}
for p in [50, 75, 90, 95, 99.5]:
    boot_percentiles[p] = np.percentile(total_ibnr_sims, p)
    print(f"Bootstrap {p:>5.1f}th: {boot_percentiles[p]:>10,.0f}")

Bootstrap  50.0th:     51,587
Bootstrap  75.0th:     65,111
Bootstrap  90.0th:     79,636
Bootstrap  95.0th:     87,589
Bootstrap  99.5th:    111,466

Histogram of the distribution

fig, ax = plt.subplots(figsize=(10, 5))

ax.hist(total_ibnr_sims, bins=80, color="#3b82f6", alpha=0.7, edgecolor="white",
        linewidth=0.4)

# Mark key percentiles
colours = {50: "#64748b", 75: "#f59e0b", 95: "#ef4444", 99.5: "#7c3aed"}
labels  = {50: "50th", 75: "75th", 95: "95th", 99.5: "99.5th"}
for p, col in colours.items():
    val = boot_percentiles[p]
    ax.axvline(val, color=col, linewidth=1.8, linestyle="--",
               label=f"{labels[p]} pctl: {val:,.0f}")

ax.axvline(total_ibnr, color="black", linewidth=2,
           label=f"CL point estimate: {total_ibnr:,.0f}")

ax.set_xlabel("Total IBNR", fontsize=12)
ax.set_ylabel("Frequency (5,000 simulations)", fontsize=12)
ax.set_title("RAA Triangle — Bootstrap ODP IBNR Distribution", fontsize=13)
ax.legend(fontsize=10)
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f"{x/1000:.0f}k"))
plt.tight_layout()
plt.savefig("raa_bootstrap_ibnr.png", dpi=150)
plt.show()

The histogram is right-skewed — a thin but heavy right tail extends well past 100,000. Most of the probability mass sits between 30,000 and 80,000 but the 99.5th percentile is more than twice the median.

Mack vs Bootstrap: a comparison

comparison = pd.DataFrame({
    "Percentile": [50, 75, 90, 95, 99.5],
    "Mack (lognormal)": [mack_percentiles[p] for p in [50, 75, 90, 95, 99.5]],
    "Bootstrap ODP":    [boot_percentiles[p]  for p in [50, 75, 90, 95, 99.5]],
})
comparison["Ratio (B/M)"] = (
    comparison["Bootstrap ODP"] / comparison["Mack (lognormal)"]
).map("{:.2f}".format)
comparison["Mack (lognormal)"] = comparison["Mack (lognormal)"].map("{:,.0f}".format)
comparison["Bootstrap ODP"]    = comparison["Bootstrap ODP"].map("{:,.0f}".format)
print(comparison.to_string(index=False))

 Percentile Mack (lognormal) Bootstrap ODP Ratio (B/M)
         50           46,338        51,587        1.11
         75           64,293        65,111        1.01
         90           86,332        79,636        0.92
         95          102,986        87,589        0.85
         99.5         161,840       111,466        0.69

The methods agree reasonably well in the middle of the distribution. The divergence at the tail is the interesting result: Mack (with a lognormal overlay) produces a heavier tail than the bootstrap. At the 99.5th percentile, Mack gives 161,840 versus 111,466 from the bootstrap — a 45% difference.

This is a well-known pattern. The lognormal overlay on Mack is somewhat arbitrary — a log-t or inverse Gaussian can be more appropriate depending on the triangle — and the lognormal tail is inherently heavier than the empirical bootstrap distribution. England and Verrall (2002) found similar divergence: the bootstrap’s empirical tail is bounded by the range of observed residuals, while a parametric Mack distribution can extend further.

Which to use?

Use Mack when you need a quick, closed-form answer that does not depend on simulation stability. Mack is also the standard in many PRA internal model submissions and has an explicit analytical interpretation. The lognormal overlay is accepted practice at Lloyd’s and in most UK reserving opinions.

Use Bootstrap ODP when you want an empirical distribution and are comfortable with the interpretation: these are the outcomes that are consistent with your observed residuals, not a parametric extrapolation. It is also better behaved when the triangle has outlier diagonals that distort the lognormal fit, and it naturally extends to including process variance (the BootstrapODPSample draws residuals from the ODP model’s process variance as well as parameter variance).

Do not use either as a substitute for scenario analysis. Both methods assume the future will look like the past. If you have a known emerging liability — a new legal precedent, a change in settlement rates, a line in run-off — the stochastic framework will not capture it. Supplement with deterministic stress scenarios.

A note on the `BootstrapODPSample` warning

When you run the bootstrap on the RAA triangle you will see a RuntimeWarning: invalid value encountered in sqrt. This comes from the hat matrix adjustment in the standardised residual calculation. A handful of cells in the RAA triangle produce leverage factors close to 1, which causes the denominator in the adjustment to approach zero. The warning is expected for small triangles and does not affect the simulation results materially. You can suppress it with:

import warnings
warnings.filterwarnings("ignore", category=RuntimeWarning,
                        module="chainladder")

Or leave it — it is informative rather than alarming.

What’s next

Part 3 (coming soon) will cover the Bornhuetter-Ferguson method — the industry’s preferred approach when the chain ladder is unreliable on thin or volatile triangles. BF blends a prior expected loss ratio with the emerging development pattern, borrowing strength from your pricing assumptions to stabilise immature accident years. We will fit it to the RAA triangle, compare it with the chain ladder, and show how to set and test the prior assumption.

reservingIBNRstochastic-reservingbootstrap-ODPMackchainladderpythonSolvency-IIIFRS-17tutorial

Back to all articles