calibration

calibration

Define the calibration class

Classes

Name Description
CalibComponent A class to compare a single channel of observed data with output from a
Calibration A class to handle calibration of Starsim simulations. Uses the Optuna hyperparameter

CalibComponent

calibration.CalibComponent(
    self,
    name,
    expected,
    extract_fn,
    conform,
    weight=1,
    include_fn=None,
    n_boot=None,
    combine_reps=None,
)

A class to compare a single channel of observed data with output from a simulation. The Calibration class can use several CalibComponent objects to form an overall understanding of how will a given simulation reflects observed data.

Parameters

Name Type Description Default
name str) the name of this component. Importantly, if extract_fn is None, the code will attempt to use the name, like “hiv.prevalence” to automatically extract data from the simulation. required
expected df) pandas DataFrame containing calibration data. The index should be the time ‘t’ in either floating point years or datetime. required
extract_fn callable) a function to extract predicted/actual data in the same format and with the same columns as expected. required
conform str | callable specify how to handle timepoints that don’t align exactly between the expected and the actual/predicted/simulated data so they “conform” to a common time grid. Whether the data represents a ‘prevalent’ or an ‘incident’ quantity impacts how this alignment is performed. If ‘prevalent’, it means data in expected & actual dataframes represent the current state of the system, stocks like the number of currently infected individuals. In this case, the data in ‘simulated’ or ‘actual’ will be interpolated to match the timepoints in ‘expected’, allowing for pointwise comparisons between the expected and actual data. If ‘incident’, it means data in expected & actual dataframes represent the accumulation of system states over a period of time, flows like the incidence of new infections. In this case, teh data in ‘simulated’ or ‘actual’ will be interpolated at the start (‘t’) and the end (‘t1’) of the period of interest in ‘expected’. The difference between these two interpolated values will be used for comparison. Finally, ‘step_containing’ is a special case for prevalent data where the actual data is interpolated using a “zero order hold” method. This means that the value of the actual (simulated) data is matched to the timepoint in the expected data that contains the timepoint of the actual data. required
weight float The weight applied to the log likelihood of this component. The total log likelihood is the sum of the log likelihoods of all components, each multiplied by its weight. 1
include_fn callable A function accepting a single simulation and returning a boolean to determine if the simulation should be included in the current component. If None, all simulations are included. None
n_boot int Experimental! Bootstrap sum sim results over seeds before comparing against expected results. Not appropriate for all component types. None
combine_reps str How to combine multiple repetitions of the same pars. Options are None, ‘mean’, ‘sum’, or other such operation. Default is None, which evaluates replicates independently instead of first combining before likelihood evaluation. None
kwargs Additional arguments to pass to the likelihood function required

Methods

Name Description
eval Compute and return the negative log likelihood
eval
calibration.CalibComponent.eval(sim, **kwargs)

Compute and return the negative log likelihood

Calibration

calibration.Calibration(
    self,
    sim,
    calib_pars,
    n_workers=None,
    total_trials=None,
    reseed=True,
    build_fn=None,
    build_kw=None,
    eval_fn=None,
    eval_kw=None,
    components=None,
    prune_fn=None,
    label=None,
    study_name=None,
    db_name=None,
    keep_db=None,
    continue_db=None,
    storage=None,
    sampler=None,
    die=False,
    debug=False,
    verbose=True,
)

A class to handle calibration of Starsim simulations. Uses the Optuna hyperparameter optimization library (optuna.org).

Parameters

Name Type Description Default
sim (Sim) the base simulation to calibrate required
calib_pars (dict) a dictionary of the parameters to calibrate of the format dict(key1=dict(low=1, high=2, guess=1.5, **kwargs), key2=...), where kwargs can include “suggest_type” to choose the suggest method of the trial (e.g. suggest_float) and args passed to the trial suggest function like “log” and “step” required
n_workers (int) the number of parallel workers (if None, will use all available CPUs) None
total_trials int) the total number of trials to run, each worker will run approximately n_trials = total_trial / n_workers None
reseed (bool) whether to generate new random seeds for each trial True
build_fn (callable) function that takes a sim object and calib_pars dictionary and returns a modified sim None
build_kw (dict) a dictionary of options that are passed to build_fn to aid in modifying the base simulation. The API is self.build_fn(sim, calib_pars=calib_pars, **self.build_kw), where sim is a copy of the base simulation to be modified with calib_pars None
components (list) CalibComponents independently assess pseudo-likelihood as part of evaluating the quality of input parameters None
prune_fn (callable) Function that takes a dictionary of parameters and returns True if the trial should be pruned None
eval_fn (callable) Function mapping a sim to a float (e.g. negative log likelihood) to be maximized. If None, the default will use CalibComponents. None
eval_kw (dict) Additional keyword arguments to pass to the eval_fn None
label (str) a label for this calibration object None
study_name (str) name of the optuna study None
db_name (str) the name of the database file (default: ‘starsim_calibration.db’) None
continue_db (bool) whether to continue if the database already exists, removes the database if false (default: false, any existing database will be deleted) None
keep_db (bool) whether to keep the database after calibration (default: false, the database will be deleted) None
storage (str) the location of the database (default: sqlite) None
sampler BaseSampler the sampler used by optuna, like optuna.samplers.TPESampler None
die (bool) whether to stop if an exception is encountered (default: false) False
debug (bool) if True, do not run in parallel False
verbose (bool) whether to print details of the calibration True

Methods

Name Description
calibrate Perform calibration.
check_fit Run before and after simulations to validate the fit
make_study Make a study, deleting if it already exists and user does not want to continue_db
parse_study Parse the study into a data frame – called automatically
plot
plot_final Plot sims after calibration
plot_optuna Plot Optuna’s visualizations
remove_db Remove the database file if keep_db is false and the path exists
run_sim Create and run a simulation
run_trial Define the objective for Optuna
run_workers Run multiple workers in parallel
to_df Return the top K results as a dataframe, sorted by value
to_json Convert the results to JSON
worker Run a single worker
calibrate
calibration.Calibration.calibrate(calib_pars=None, **kwargs)

Perform calibration.

Parameters
Name Type Description Default
calib_pars dict if supplied, overwrite stored calib_pars None
kwargs dict if supplied, overwrite stored run_args (n_trials, n_workers, etc.) {}
check_fit
calibration.Calibration.check_fit(do_plot=True)

Run before and after simulations to validate the fit

make_study
calibration.Calibration.make_study()

Make a study, deleting if it already exists and user does not want to continue_db

parse_study
calibration.Calibration.parse_study(study)

Parse the study into a data frame – called automatically

plot
calibration.Calibration.plot(**kwargs)

” Plot the calibration results. For a component-based likelihood, it only makes sense to directly call plot after calling eval_fn.

plot_final
calibration.Calibration.plot_final(**kwargs)

Plot sims after calibration

Parameters
Name Type Description Default
kwargs dict passed to MultiSim.plot() {}
plot_optuna
calibration.Calibration.plot_optuna(methods=None)

Plot Optuna’s visualizations

remove_db
calibration.Calibration.remove_db()

Remove the database file if keep_db is false and the path exists

run_sim
calibration.Calibration.run_sim(calib_pars=None, label=None)

Create and run a simulation

run_trial
calibration.Calibration.run_trial(trial)

Define the objective for Optuna

run_workers
calibration.Calibration.run_workers()

Run multiple workers in parallel

to_df
calibration.Calibration.to_df(top_k=None)

Return the top K results as a dataframe, sorted by value

to_json
calibration.Calibration.to_json(filename=None, indent=2, **kwargs)

Convert the results to JSON

worker
calibration.Calibration.worker()

Run a single worker

Functions

Name Description
linear_accum Interpolate in the cumulative sum, then difference. Use for incident data
linear_interp Simply interpolate, use for prevalent (stock) data like prevalence
step_containing Find the step containing the the timepoint. Use for prevalent data like

linear_accum

calibration.linear_accum(expected, actual)

Interpolate in the cumulative sum, then difference. Use for incident data (flows) like incidence or new_deaths. The accumulation is done between ‘t’ and ‘t1’, both of which must be present in the index of expected and actual dataframes.

Parameters

Name Type Description Default
expected pd.DataFrame The expected data from field observation, must have ‘t’ and ‘t1’ in the index and columns corresponding to specific needs of the selected component. required
actual pd.DataFrame The actual data from the simulation, must have ‘t’ and ‘t1’ in the index and columns corresponding to specific needs of the selected component. required

linear_interp

calibration.linear_interp(expected, actual)

Simply interpolate, use for prevalent (stock) data like prevalence

Parameters

Name Type Description Default
expected pd.DataFrame The expected data from field observation, must have ‘t’ in the index and columns corresponding to specific needs of the selected component. required
actual pd.DataFrame The actual data from the simulation, must have ‘t’ in the index and columns corresponding to specific needs of the selected component. required

step_containing

calibration.step_containing(expected, actual)

Find the step containing the the timepoint. Use for prevalent data like prevalence where you want to match a specific time point rather than interpolate.

Parameters

Name Type Description Default
expected pd.DataFrame The expected data from field observation, must have ‘t’ in the index and columns corresponding to specific needs of the selected component. required
actual pd.DataFrame The actual data from the simulation, must have ‘t’ in the index and columns corresponding to specific needs of the selected component. required