calibration
calibration
Define the calibration class
Classes
Name | Description |
---|---|
CalibComponent | A class to compare a single channel of observed data with output from a |
Calibration | A class to handle calibration of Starsim simulations. Uses the Optuna hyperparameter |
CalibComponent
calibration.CalibComponent(self,
name,
expected,
extract_fn,
conform,=1,
weight=None,
include_fn=None,
n_boot=None,
combine_reps )
A class to compare a single channel of observed data with output from a simulation. The Calibration class can use several CalibComponent objects to form an overall understanding of how will a given simulation reflects observed data.
Parameters
Name | Type | Description | Default |
---|---|---|---|
name | str) | the name of this component. Importantly, if extract_fn is None, the code will attempt to use the name, like “hiv.prevalence” to automatically extract data from the simulation. | required |
expected | df) | pandas DataFrame containing calibration data. The index should be the time ‘t’ in either floating point years or datetime. | required |
extract_fn | callable) | a function to extract predicted/actual data in the same format and with the same columns as expected . |
required |
conform | str | callable | specify how to handle timepoints that don’t align exactly between the expected and the actual/predicted/simulated data so they “conform” to a common time grid. Whether the data represents a ‘prevalent’ or an ‘incident’ quantity impacts how this alignment is performed. If ‘prevalent’, it means data in expected & actual dataframes represent the current state of the system, stocks like the number of currently infected individuals. In this case, the data in ‘simulated’ or ‘actual’ will be interpolated to match the timepoints in ‘expected’, allowing for pointwise comparisons between the expected and actual data. If ‘incident’, it means data in expected & actual dataframes represent the accumulation of system states over a period of time, flows like the incidence of new infections. In this case, teh data in ‘simulated’ or ‘actual’ will be interpolated at the start (‘t’) and the end (‘t1’) of the period of interest in ‘expected’. The difference between these two interpolated values will be used for comparison. Finally, ‘step_containing’ is a special case for prevalent data where the actual data is interpolated using a “zero order hold” method. This means that the value of the actual (simulated) data is matched to the timepoint in the expected data that contains the timepoint of the actual data. | required |
weight | float | The weight applied to the log likelihood of this component. The total log likelihood is the sum of the log likelihoods of all components, each multiplied by its weight. | 1 |
include_fn | callable | A function accepting a single simulation and returning a boolean to determine if the simulation should be included in the current component. If None, all simulations are included. | None |
n_boot | int | Experimental! Bootstrap sum sim results over seeds before comparing against expected results. Not appropriate for all component types. | None |
combine_reps | str | How to combine multiple repetitions of the same pars. Options are None, ‘mean’, ‘sum’, or other such operation. Default is None, which evaluates replicates independently instead of first combining before likelihood evaluation. | None |
kwargs | Additional arguments to pass to the likelihood function | required |
Methods
Name | Description |
---|---|
eval | Compute and return the negative log likelihood |
eval
eval(sim, **kwargs) calibration.CalibComponent.
Compute and return the negative log likelihood
Calibration
calibration.Calibration(self,
sim,
calib_pars,=None,
n_workers=None,
total_trials=True,
reseed=None,
build_fn=None,
build_kw=None,
eval_fn=None,
eval_kw=None,
components=None,
prune_fn=None,
label=None,
study_name=None,
db_name=None,
keep_db=None,
continue_db=None,
storage=None,
sampler=False,
die=False,
debug=True,
verbose )
A class to handle calibration of Starsim simulations. Uses the Optuna hyperparameter optimization library (optuna.org).
Parameters
Name | Type | Description | Default |
---|---|---|---|
sim | (Sim) | the base simulation to calibrate | required |
calib_pars | (dict) | a dictionary of the parameters to calibrate of the format dict(key1=dict(low=1, high=2, guess=1.5, **kwargs), key2=...) , where kwargs can include “suggest_type” to choose the suggest method of the trial (e.g. suggest_float) and args passed to the trial suggest function like “log” and “step” |
required |
n_workers | (int) | the number of parallel workers (if None, will use all available CPUs) | None |
total_trials | int) | the total number of trials to run, each worker will run approximately n_trials = total_trial / n_workers | None |
reseed | (bool) | whether to generate new random seeds for each trial | True |
build_fn | (callable) | function that takes a sim object and calib_pars dictionary and returns a modified sim | None |
build_kw | (dict) | a dictionary of options that are passed to build_fn to aid in modifying the base simulation. The API is self.build_fn(sim, calib_pars=calib_pars, **self.build_kw) , where sim is a copy of the base simulation to be modified with calib_pars |
None |
components | (list) | CalibComponents independently assess pseudo-likelihood as part of evaluating the quality of input parameters | None |
prune_fn | (callable) | Function that takes a dictionary of parameters and returns True if the trial should be pruned | None |
eval_fn | (callable) | Function mapping a sim to a float (e.g. negative log likelihood) to be maximized. If None, the default will use CalibComponents. | None |
eval_kw | (dict) | Additional keyword arguments to pass to the eval_fn | None |
label | (str) | a label for this calibration object | None |
study_name | (str) | name of the optuna study | None |
db_name | (str) | the name of the database file (default: ‘starsim_calibration.db’) | None |
continue_db | (bool) | whether to continue if the database already exists, removes the database if false (default: false, any existing database will be deleted) | None |
keep_db | (bool) | whether to keep the database after calibration (default: false, the database will be deleted) | None |
storage | (str) | the location of the database (default: sqlite) | None |
sampler | BaseSampler |
the sampler used by optuna, like optuna.samplers.TPESampler | None |
die | (bool) | whether to stop if an exception is encountered (default: false) | False |
debug | (bool) | if True, do not run in parallel | False |
verbose | (bool) | whether to print details of the calibration | True |
Methods
Name | Description |
---|---|
calibrate | Perform calibration. |
check_fit | Run before and after simulations to validate the fit |
make_study | Make a study, deleting if it already exists and user does not want to continue_db |
parse_study | Parse the study into a data frame – called automatically |
plot | ” |
plot_final | Plot sims after calibration |
plot_optuna | Plot Optuna’s visualizations |
remove_db | Remove the database file if keep_db is false and the path exists |
run_sim | Create and run a simulation |
run_trial | Define the objective for Optuna |
run_workers | Run multiple workers in parallel |
to_df | Return the top K results as a dataframe, sorted by value |
to_json | Convert the results to JSON |
worker | Run a single worker |
calibrate
=None, **kwargs) calibration.Calibration.calibrate(calib_pars
Perform calibration.
Parameters
Name | Type | Description | Default |
---|---|---|---|
calib_pars | dict | if supplied, overwrite stored calib_pars | None |
kwargs | dict | if supplied, overwrite stored run_args (n_trials, n_workers, etc.) | {} |
check_fit
=True) calibration.Calibration.check_fit(do_plot
Run before and after simulations to validate the fit
make_study
calibration.Calibration.make_study()
Make a study, deleting if it already exists and user does not want to continue_db
parse_study
calibration.Calibration.parse_study(study)
Parse the study into a data frame – called automatically
plot
**kwargs) calibration.Calibration.plot(
” Plot the calibration results. For a component-based likelihood, it only makes sense to directly call plot after calling eval_fn.
plot_final
**kwargs) calibration.Calibration.plot_final(
Plot sims after calibration
Parameters
Name | Type | Description | Default |
---|---|---|---|
kwargs | dict | passed to MultiSim.plot() | {} |
plot_optuna
=None) calibration.Calibration.plot_optuna(methods
Plot Optuna’s visualizations
remove_db
calibration.Calibration.remove_db()
Remove the database file if keep_db is false and the path exists
run_sim
=None, label=None) calibration.Calibration.run_sim(calib_pars
Create and run a simulation
run_trial
calibration.Calibration.run_trial(trial)
Define the objective for Optuna
run_workers
calibration.Calibration.run_workers()
Run multiple workers in parallel
to_df
=None) calibration.Calibration.to_df(top_k
Return the top K results as a dataframe, sorted by value
to_json
=None, indent=2, **kwargs) calibration.Calibration.to_json(filename
Convert the results to JSON
worker
calibration.Calibration.worker()
Run a single worker
Functions
Name | Description |
---|---|
linear_accum | Interpolate in the cumulative sum, then difference. Use for incident data |
linear_interp | Simply interpolate, use for prevalent (stock) data like prevalence |
step_containing | Find the step containing the the timepoint. Use for prevalent data like |
linear_accum
calibration.linear_accum(expected, actual)
Interpolate in the cumulative sum, then difference. Use for incident data (flows) like incidence or new_deaths. The accumulation is done between ‘t’ and ‘t1’, both of which must be present in the index of expected and actual dataframes.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expected | pd.DataFrame | The expected data from field observation, must have ‘t’ and ‘t1’ in the index and columns corresponding to specific needs of the selected component. | required |
actual | pd.DataFrame | The actual data from the simulation, must have ‘t’ and ‘t1’ in the index and columns corresponding to specific needs of the selected component. | required |
linear_interp
calibration.linear_interp(expected, actual)
Simply interpolate, use for prevalent (stock) data like prevalence
Parameters
Name | Type | Description | Default |
---|---|---|---|
expected | pd.DataFrame | The expected data from field observation, must have ‘t’ in the index and columns corresponding to specific needs of the selected component. | required |
actual | pd.DataFrame | The actual data from the simulation, must have ‘t’ in the index and columns corresponding to specific needs of the selected component. | required |
step_containing
calibration.step_containing(expected, actual)
Find the step containing the the timepoint. Use for prevalent data like prevalence where you want to match a specific time point rather than interpolate.
Parameters
Name | Type | Description | Default |
---|---|---|---|
expected | pd.DataFrame | The expected data from field observation, must have ‘t’ in the index and columns corresponding to specific needs of the selected component. | required |
actual | pd.DataFrame | The actual data from the simulation, must have ‘t’ in the index and columns corresponding to specific needs of the selected component. | required |