MCMC Autofit

Ledger’s Bayesian modeling infrastructure includes an autofit procedure, which checks a fitted model’s convergence diagnostics against a set of standard thresholds and, if any have failed, attempt to tune certain MCMC parameters and re-fit the model. This aims to reduce the burden on remote users to constantly check for convergence. The autofit arguments are exposed to users via the configuration dictionary passed to create methods.

In general, we recommend users only change the autofit parameters if they are familiar with Hamiltonion Monte Carlo MCMC tuning parameters. The parameters are controlled by the AutofitControl class, and the default parameters are:

>>> from ledger_analytics import AutofitControl
>>> AutofitControl().__dict__

    {'samples_per_chain': 2500,
     'warmup_per_chain': None,
     'adapt_delta': 0.8,
     'max_treedepth': 10,
     'thin': 1,
     'max_adapt_delta': 0.99,
     'max_max_treedepth': 15,
     'max_samples_per_chain': 4000,
     'chains': 4,
     'divergence_rate_threshold': 0.0,
     'treedepth_rate_threshold': 0.0,
     'ebfmi_threshold': 0.2,
     'min_ess': 1000,
     'max_rhat': 1.05}

If you want to disable autofit from refitting models, you can set a configuration such as (for an example model):

client.development_model.create(
            ...,
            config={
                    "autofit_override": {
                            "max_samples_per_chain": 2500,
                            "max_max_treedepth": 10,
                            "max_adapt_delta": 0.8,
                    }
            },
    )

which stops the samples per chain, maximum treedepth rate and HMC target acceptance rate parameters from being tuned to seek convergence. Note, users pass the autofit arguments into the configuration as a regular Python dictionary. Behind the scenes, this dictionary is validated using the AutofitControl class.

You can see the API documentation for the AutofitControl class below.

class ledger_analytics.AutofitControl(*, samples_per_chain: int = 2500, warmup_per_chain: int | None = None, adapt_delta: float = 0.8, max_treedepth: int = 10, thin: int = 1, max_adapt_delta: float = 0.99, max_max_treedepth: int = 15, max_samples_per_chain: int = 4000, chains: int = 4, divergence_rate_threshold: float = 0.0, treedepth_rate_threshold: float = 0.0, ebfmi_threshold: float = 0.2, min_ess: int = 1000, max_rhat: float = 1.05)

The HMC (MCMC) autofitting parameters.

The class holds the parameters users can tune in the autofit procedure. Only use this class if you feel confident with HMC tuning parameters. When creating models, the intention is that users pass the autofit parameters as a Python dict, not use this class.

If you want to turn off the autofitting procedure completely, you can use a configuration such as:

AutofitControl(
    samples_per_chain: 1000,
    max_samples_per_chain: 1000,
    max_adapt_delta: 0.8,
    max_max_treedepth: 10,
)
samples_per_chain

the number of posterior samples per chain.

Type:

int

warmup_per_chain

the number of warmup samples per chain. If None, defaults to half the posterior samples_per_chain.

Type:

int | None

adapt_delta

the initial HMC target average proposal acceptance probability.

Type:

float

max_treedepth

the initial maximum depth of the binary trees.

Type:

int

thin

the posterior samples thinning interval. Recommended to stay a 1.

Type:

int

max_adapt_delta

the maximum adapt_delta value to try.

Type:

float

max_max_treedeth

the maximum max_treedepth value to try.

max_samples_per_chain

the maximum samples_per_chain to try.

Type:

int

chains

the number of MCMC chains.

Type:

int

divergence_rate_threshold

the threshold value of average allowed divergent transitions.

Type:

float

treedepth_rate_threshold

the threshold value of average allowed iterations hitting the max_treedepth.

Type:

float

ebfmi_threshold

the threshold value of the EBFMI diagnostic.

Type:

float

min_ess

the minimum effective sample size required.

Type:

int

max_rhat

the maximum Rhat diagnostic.

Type:

float