calibration

calibration

Define the calibration class

Classes

Name	Description
CalibComponent	A class to compare a single channel of observed data with output from a
Calibration	A class to handle calibration of Starsim simulations. Uses the Optuna hyperparameter

CalibComponent

calibration.CalibComponent(
    self,
    name,
    expected,
    extract_fn,
    conform,
    weight=1,
    include_fn=None,
    n_boot=None,
    combine_reps=None,
)

A class to compare a single channel of observed data with output from a simulation. The Calibration class can use several CalibComponent objects to form an overall understanding of how will a given simulation reflects observed data.

Parameters

Name	Type	Description	Default
name	str)	the name of this component. Importantly, if extract_fn is None, the code will attempt to use the name, like “hiv.prevalence” to automatically extract data from the simulation.	required
expected	df)	pandas DataFrame containing calibration data. The index should be the time ‘t’ in either floating point years or datetime.	required
extract_fn	callable)	a function to extract predicted/actual data in the same format and with the same columns as `expected`.	required
conform	str \| callable	specify how to handle timepoints that don’t align exactly between the expected and the actual/predicted/simulated data so they “conform” to a common time grid. Whether the data represents a ‘prevalent’ or an ‘incident’ quantity impacts how this alignment is performed. If ‘prevalent’, it means data in expected & actual dataframes represent the current state of the system, stocks like the number of currently infected individuals. In this case, the data in ‘simulated’ or ‘actual’ will be interpolated to match the timepoints in ‘expected’, allowing for pointwise comparisons between the expected and actual data. If ‘incident’, it means data in expected & actual dataframes represent the accumulation of system states over a period of time, flows like the incidence of new infections. In this case, teh data in ‘simulated’ or ‘actual’ will be interpolated at the start (‘t’) and the end (‘t1’) of the period of interest in ‘expected’. The difference between these two interpolated values will be used for comparison. Finally, ‘step_containing’ is a special case for prevalent data where the actual data is interpolated using a “zero order hold” method. This means that the value of the actual (simulated) data is matched to the timepoint in the expected data that contains the timepoint of the actual data.	required
weight	float	The weight applied to the log likelihood of this component. The total log likelihood is the sum of the log likelihoods of all components, each multiplied by its weight.	`1`
include_fn	callable	A function accepting a single simulation and returning a boolean to determine if the simulation should be included in the current component. If None, all simulations are included.	`None`
n_boot	int	Experimental! Bootstrap sum sim results over seeds before comparing against expected results. Not appropriate for all component types.	`None`
combine_reps	str	How to combine multiple repetitions of the same pars. Options are None, ‘mean’, ‘sum’, or other such operation. Default is None, which evaluates replicates independently instead of first combining before likelihood evaluation.	`None`
kwargs		Additional arguments to pass to the likelihood function	required

Methods

Name	Description
eval	Compute and return the negative log likelihood

eval

calibration.CalibComponent.eval(sim, **kwargs)

Compute and return the negative log likelihood

Calibration

calibration.Calibration(
    self,
    sim,
    calib_pars,
    n_workers=None,
    total_trials=None,
    reseed=True,
    build_fn=None,
    build_kw=None,
    eval_fn=None,
    eval_kw=None,
    components=None,
    prune_fn=None,
    label=None,
    study_name=None,
    db_name=None,
    keep_db=None,
    continue_db=None,
    storage=None,
    sampler=None,
    die=False,
    debug=False,
    verbose=True,
)

A class to handle calibration of Starsim simulations. Uses the Optuna hyperparameter optimization library (optuna.org).

Parameters

Name	Type	Description	Default
sim	(Sim)	the base simulation to calibrate	required
calib_pars	(dict)	a dictionary of the parameters to calibrate of the format `dict(key1=dict(low=1, high=2, guess=1.5, **kwargs), key2=...)`, where kwargs can include “suggest_type” to choose the suggest method of the trial (e.g. suggest_float) and args passed to the trial suggest function like “log” and “step”	required
n_workers	(int)	the number of parallel workers (if None, will use all available CPUs)	`None`
total_trials	int)	the total number of trials to run, each worker will run approximately n_trials = total_trial / n_workers	`None`
reseed	(bool)	whether to generate new random seeds for each trial	`True`
build_fn	(callable)	function that takes a sim object and calib_pars dictionary and returns a modified sim	`None`
build_kw	(dict)	a dictionary of options that are passed to build_fn to aid in modifying the base simulation. The API is `self.build_fn(sim, calib_pars=calib_pars, **self.build_kw)`, where sim is a copy of the base simulation to be modified with calib_pars	`None`
components	(list)	CalibComponents independently assess pseudo-likelihood as part of evaluating the quality of input parameters	`None`
prune_fn	(callable)	Function that takes a dictionary of parameters and returns True if the trial should be pruned	`None`
eval_fn	(callable)	Function mapping a sim to a float (e.g. negative log likelihood) to be maximized. If None, the default will use CalibComponents.	`None`
eval_kw	(dict)	Additional keyword arguments to pass to the eval_fn	`None`
label	(str)	a label for this calibration object	`None`
study_name	(str)	name of the optuna study	`None`
db_name	(str)	the name of the database file (default: ‘starsim_calibration.db’)	`None`
continue_db	(bool)	whether to continue if the database already exists, removes the database if false (default: false, any existing database will be deleted)	`None`
keep_db	(bool)	whether to keep the database after calibration (default: false, the database will be deleted)	`None`
storage	(str)	the location of the database (default: sqlite)	`None`
sampler	`BaseSampler`	the sampler used by optuna, like optuna.samplers.TPESampler	`None`
die	(bool)	whether to stop if an exception is encountered (default: false)	`False`
debug	(bool)	if True, do not run in parallel	`False`
verbose	(bool)	whether to print details of the calibration	`True`

Methods

Name	Description
calibrate	Perform calibration.
check_fit	Run before and after simulations to validate the fit
make_study	Make a study, deleting if it already exists and user does not want to continue_db
parse_study	Parse the study into a data frame – called automatically
plot	”
plot_final	Plot sims after calibration
plot_optuna	Plot Optuna’s visualizations
remove_db	Remove the database file if keep_db is false and the path exists
run_sim	Create and run a simulation
run_trial	Define the objective for Optuna
run_workers	Run multiple workers in parallel
to_df	Return the top K results as a dataframe, sorted by value
to_json	Convert the results to JSON
worker	Run a single worker

calibrate

calibration.Calibration.calibrate(calib_pars=None, **kwargs)

Perform calibration.

Parameters

Name	Type	Description	Default
calib_pars	dict	if supplied, overwrite stored calib_pars	`None`
kwargs	dict	if supplied, overwrite stored run_args (n_trials, n_workers, etc.)	`{}`

check_fit

calibration.Calibration.check_fit(do_plot=True)

Run before and after simulations to validate the fit

make_study

calibration.Calibration.make_study()

Make a study, deleting if it already exists and user does not want to continue_db

parse_study

calibration.Calibration.parse_study(study)

Parse the study into a data frame – called automatically

plot

calibration.Calibration.plot(**kwargs)

” Plot the calibration results. For a component-based likelihood, it only makes sense to directly call plot after calling eval_fn.

plot_final

calibration.Calibration.plot_final(**kwargs)

Plot sims after calibration

Parameters

Name	Type	Description	Default
kwargs	dict	passed to MultiSim.plot()	`{}`

plot_optuna

calibration.Calibration.plot_optuna(methods=None)

Plot Optuna’s visualizations

remove_db

calibration.Calibration.remove_db()

Remove the database file if keep_db is false and the path exists

run_sim

calibration.Calibration.run_sim(calib_pars=None, label=None)

Create and run a simulation

run_trial

calibration.Calibration.run_trial(trial)

Define the objective for Optuna

run_workers

calibration.Calibration.run_workers()

Run multiple workers in parallel

to_df

calibration.Calibration.to_df(top_k=None)

Return the top K results as a dataframe, sorted by value

to_json

calibration.Calibration.to_json(filename=None, indent=2, **kwargs)

Convert the results to JSON

worker

calibration.Calibration.worker()

Run a single worker

Functions

Name	Description
linear_accum	Interpolate in the cumulative sum, then difference. Use for incident data
linear_interp	Simply interpolate, use for prevalent (stock) data like prevalence
step_containing	Find the step containing the the timepoint. Use for prevalent data like

linear_accum

calibration.linear_accum(expected, actual)

Interpolate in the cumulative sum, then difference. Use for incident data (flows) like incidence or new_deaths. The accumulation is done between ‘t’ and ‘t1’, both of which must be present in the index of expected and actual dataframes.

Parameters

Name	Type	Description	Default
expected	pd.DataFrame	The expected data from field observation, must have ‘t’ and ‘t1’ in the index and columns corresponding to specific needs of the selected component.	required
actual	pd.DataFrame	The actual data from the simulation, must have ‘t’ and ‘t1’ in the index and columns corresponding to specific needs of the selected component.	required

linear_interp

calibration.linear_interp(expected, actual)

Simply interpolate, use for prevalent (stock) data like prevalence

Parameters

Name	Type	Description	Default
expected	pd.DataFrame	The expected data from field observation, must have ‘t’ in the index and columns corresponding to specific needs of the selected component.	required
actual	pd.DataFrame	The actual data from the simulation, must have ‘t’ in the index and columns corresponding to specific needs of the selected component.	required

step_containing

calibration.step_containing(expected, actual)

Find the step containing the the timepoint. Use for prevalent data like prevalence where you want to match a specific time point rather than interpolate.

Parameters

Name	Type	Description	Default
expected	pd.DataFrame	The expected data from field observation, must have ‘t’ in the index and columns corresponding to specific needs of the selected component.	required
actual	pd.DataFrame	The actual data from the simulation, must have ‘t’ in the index and columns corresponding to specific needs of the selected component.	required