MCMC Autofit¶
Ledger’s Bayesian modeling infrastructure includes an autofit
procedure, which checks a fitted model’s convergence diagnostics
against a set of standard thresholds and, if any have failed,
attempt to tune certain MCMC parameters and re-fit the model.
This aims to reduce the burden on remote users to constantly
check for convergence.
The autofit arguments are exposed to users via the configuration
dictionary passed to create
methods.
In general, we recommend users only change the autofit parameters
if they are familiar with Hamiltonion Monte Carlo MCMC tuning
parameters. The parameters are controlled by the AutofitControl
class, and the default parameters are:
>>> from ledger_analytics import AutofitControl
>>> AutofitControl().__dict__
{'samples_per_chain': 2500,
'warmup_per_chain': None,
'adapt_delta': 0.8,
'max_treedepth': 10,
'thin': 1,
'max_adapt_delta': 0.99,
'max_max_treedepth': 15,
'max_samples_per_chain': 4000,
'chains': 4,
'divergence_rate_threshold': 0.0,
'treedepth_rate_threshold': 0.0,
'ebfmi_threshold': 0.2,
'min_ess': 1000,
'max_rhat': 1.05}
If you want to disable autofit from refitting models, you can set a configuration such as (for an example model):
client.development_model.create(
...,
config={
"autofit_override": {
"max_samples_per_chain": 2500,
"max_max_treedepth": 10,
"max_adapt_delta": 0.8,
}
},
)
which stops the samples per chain, maximum treedepth rate
and HMC target acceptance rate parameters from being tuned
to seek convergence.
Note, users pass the autofit arguments into the configuration
as a regular Python dictionary. Behind the scenes, this dictionary
is validated using the AutofitControl
class.
You can see the API documentation for the AutofitControl
class below.
- class ledger_analytics.AutofitControl(*, samples_per_chain: int = 2500, warmup_per_chain: int | None = None, adapt_delta: float = 0.8, max_treedepth: int = 10, thin: int = 1, max_adapt_delta: float = 0.99, max_max_treedepth: int = 15, max_samples_per_chain: int = 4000, chains: int = 4, divergence_rate_threshold: float = 0.0, treedepth_rate_threshold: float = 0.0, ebfmi_threshold: float = 0.2, min_ess: int = 1000, max_rhat: float = 1.05)¶
The HMC (MCMC) autofitting parameters.
The class holds the parameters users can tune in the autofit procedure. Only use this class if you feel confident with HMC tuning parameters. When creating models, the intention is that users pass the autofit parameters as a Python dict, not use this class.
If you want to turn off the autofitting procedure completely, you can use a configuration such as:
AutofitControl( samples_per_chain: 1000, max_samples_per_chain: 1000, max_adapt_delta: 0.8, max_max_treedepth: 10, )
- samples_per_chain¶
the number of posterior samples per chain.
- Type:
int
- warmup_per_chain¶
the number of warmup samples per chain. If
None
, defaults to half the posteriorsamples_per_chain
.- Type:
int | None
- adapt_delta¶
the initial HMC target average proposal acceptance probability.
- Type:
float
- max_treedepth¶
the initial maximum depth of the binary trees.
- Type:
int
- thin¶
the posterior samples thinning interval. Recommended to stay a
1
.- Type:
int
- max_adapt_delta¶
the maximum
adapt_delta
value to try.- Type:
float
- max_max_treedeth¶
the maximum
max_treedepth
value to try.
- max_samples_per_chain¶
the maximum
samples_per_chain
to try.- Type:
int
- chains¶
the number of MCMC chains.
- Type:
int
- divergence_rate_threshold¶
the threshold value of average allowed divergent transitions.
- Type:
float
- treedepth_rate_threshold¶
the threshold value of average allowed iterations hitting the
max_treedepth
.- Type:
float
- ebfmi_threshold¶
the threshold value of the EBFMI diagnostic.
- Type:
float
- min_ess¶
the minimum effective sample size required.
- Type:
int
- max_rhat¶
the maximum Rhat diagnostic.
- Type:
float