Every series of posts on credibility eventually reaches the same wall. Bühlmann-Straub gives you a point estimate. Poisson-Gamma conjugacy gives you an exact posterior, but only under conjugacy — Poisson counts, Gamma priors, no crossed effects. When you step outside those constraints — Gamma severity, crossed NCD × territory effects, time-varying risk — you reach for MCMC. And MCMC works; it is just slow. For a commercial lines portfolio with 150 schemes, a well-specified PyMC model takes 20–40 minutes to sample. Run that on every segment at renewal and you have a process measured in days, not the sub-second latency that production pricing systems require.

Habermann et al. at the University of Tübingen published a paper in August 2024 that directly addresses this (arXiv:2408.13230), which appeared at ICLR 2025. The core idea is amortization: train a neural network once on simulated data from your prior, then use it to compute posterior distributions for any new observed dataset in a single forward pass. The training cost is borne once. Subsequent inference is milliseconds.


What amortization means

Classical MCMC starts from scratch with every new dataset. Given observations y_new, you run a Markov chain until it converges to the posterior p(θ | y_new). For hierarchical models, convergence requires thousands of warm-up steps, and the geometry of the posterior — particularly the funnel-shaped dependence between group-level and hyper-level parameters — makes NUTS work hard. The result is correct but expensive, and the cost is paid every time you want a posterior for a new dataset.

Amortized inference shifts the computation. Instead of running a chain, you train an inference network q_φ(θ | y) parameterized by weights φ to approximate the true posterior. The training objective minimizes the expected KL divergence between q_φ(θ | y) and p(θ | y) across the prior predictive distribution — you simulate many (θ, y) pairs from the model and train the network to map y to the approximate posterior over θ.

After training, inference is a forward pass: feed y_new into the network, get back a posterior approximation. No chain, no warm-up, no convergence diagnostics.

The critical point is model specificity. The trained network is a posterior approximator for a particular model family — a particular prior and likelihood structure. You cannot take a network trained for Bühlmann-Straub and use it to approximate posteriors for a spatial CAR model. But if your commercial lines credibility problem has a fixed hierarchical structure that you apply to hundreds of scheme segments annually, training once and amortizing across those segments is exactly the right trade-off.


The Habermann et al. paper

The paper — “Amortized Bayesian Multilevel Models” — focuses specifically on multilevel (hierarchical) Bayesian models. These are the same structures used in actuarial credibility: group-level parameters θᵢ drawn from a population-level prior governed by hyperparameters τ. The evidence for each group is its own data yᵢ, and the posterior over θᵢ should borrow strength across groups via the shared hyperparameters.

The contribution is a network architecture that handles the characteristic features of multilevel models:

The authors evaluate on synthetic benchmarks and real datasets including sleep study data (repeated measures, mixed effects) and scholastic assessment data (school-level random effects). The inference network recovers the posterior mean and uncertainty intervals that match Stan’s NUTS output at a fraction of the compute cost post-training.

We cannot verify the claim that the network generalizes directly to insurance-specific structures without implementation work. The paper does not contain an insurance experiment. What it establishes is that amortized inference is viable for the class of models — hierarchical, mixed-effects — that underlie actuarial credibility theory.


The insurance gap this fills

To be concrete about where this sits in the credibility toolkit:

Bühlmann-Straub is a point estimate. Technically the optimal linear Bayes estimator under squared error loss. Fast, interpretable, well-understood by actuaries and regulators. No uncertainty quantification; hyperparameter uncertainty silently ignored.

Poisson-Gamma conjugacy gives an exact posterior for Poisson counts. PoissonGammaCredibility in insurance-credibility v0.1.8 implements this. Still restricted to the Poisson-Gamma family — Gamma severity or negative-binomial frequency requires a different conjugate family, and some combinations have no conjugate form at all.

Full Bayesian via MCMC (PyMC, Stan) gives the exact posterior for any model you can specify. The constraint is time: a non-trivial hierarchical model for a commercial lines portfolio takes tens of minutes. Acceptable for an annual rerate. Not acceptable for an automated renewal workflow or a capital model that needs to reprice 50,000 schemes.

Amortized inference sits between the conjugate approach and full MCMC. It requires upfront training cost — the prior-to-posterior approximation has to be learned, which takes hours the first time. After that, inference is a forward pass. For a portfolio repriced daily or weekly, that trade-off is highly favourable.

The remaining constraint: you need a prior specification. If you cannot describe a plausible generative model for how scheme-level claim rates are distributed across your portfolio, amortized inference does not help. The same constraint applies to MCMC; it just becomes more visible here because the prior is baked into the training data rather than specified interactively.


Why non-conjugate credibility matters

The standard justification for Poisson-Gamma is that claim counts follow Poisson processes. This is approximately correct for homogeneous exposure. In practice, UK commercial lines work involves:

For all of these, the current choice is either MCMC (exact but slow) or a linear approximation like B-S (fast but approximate). Amortized inference would offer a third path: train a model-specific network once, then price in milliseconds with a full posterior.


What we do not have yet

The honest assessment: as of April 2026, there is no insurance-specific amortized credibility implementation available in insurance-credibility or anywhere else in the Python actuarial ecosystem. The gap identified in our March research scan remains open.

BayesFlow v2 (the library from the Radev group at Tübingen, MIT licence) provides the building blocks — inference network architectures, summary networks for sequential data, training loops for amortized posterior approximation. The work required is to specify a realistic insurance prior (claim rate distribution across schemes, plausible variance components), generate training data from that prior, and validate that the network recovers posteriors consistent with Stan on a holdout set.

That is not trivial. Prior specification for insurance models requires decisions that a generic research library cannot make for you: what is the realistic coefficient of variation of scheme-level claim rates across your commercial book? What is the distribution of group sizes (exposure per scheme)? How much does the between-group variance change across lines of business? Getting these wrong produces a trained network that is fast but confidently wrong — the actuarial equivalent of a confident MCMC chain that has not converged.

We plan to build this as an extension to insurance-credibility. The architecture is BayesFlow v2, the target is Gamma severity and negative-binomial frequency as the immediate use cases beyond Poisson-Gamma. No timeline commitment in this post.


The broader direction

Amortized inference is part of a wider shift in probabilistic machine learning toward simulation-based inference — learning posteriors from simulated data rather than deriving them analytically. The companion paper by Arruda et al. (arXiv:2505.14429, May 2025) extends this to compositional amortization, which scales to 100,000+ data points and 10,000+ parameters via diffusion-model-based score matching. For a spatial postcode-sector model with 9,500 UK sectors, that matters.

The practical insurance application is not just renewal pricing. Capital models that quantify uncertainty in ultimate losses at segment level are also hierarchical Bayesian models. Today, those models either approximate credibility with B-S point estimates (fast but uncertain about their uncertainty) or run full MCMC and accept the compute cost. Amortized inference would allow capital models to run with full posterior uncertainty at production speed — which is a different quality of risk quantification.

We are not claiming this is solved. The paper demonstrates viability on standard multilevel benchmark problems. The insurance path requires prior specification work, validation against MCMC on real insurance data, and integration into a production pricing framework. None of that is in arXiv:2408.13230. What is in the paper is the neural architecture and training methodology that makes it achievable.


Where we are

The credibility series so far has moved from classical Bühlmann-Straub, through benchmarking on freMTPL2, to conjugate Bayesian credibility with uncertainty intervals, to the question of what happens when neither conjugacy nor MCMC speed is acceptable.

Amortized inference is the answer to that question. It is not fully implemented in the insurance context yet. The paper that makes it tractable — Habermann et al., ICLR 2025 — was published eight months ago and has not yet been adapted for actuarial use. We think it should be. The PoissonGammaCredibility class in insurance-credibility was the first step toward a library where the Bayesian answer is as fast as the classical answer. Amortized credibility for non-conjugate cases is the next step.

The code does not exist yet. The mathematics does.


Related: