samples

samples

Create a class for storing a large number of simulations.

Hierarchy - result: parameters, seed, beta - samples: collection of results with same parameters but different seeds - dataset: collection of samples (with different parameters)

Classes

Name Description
Dataset Store the results and provide options for filtering
Samples Stores CSV outputs and summary dataframes

Dataset

samples.Dataset(folder=None, results=None, *args, **kwargs)

Store the results and provide options for filtering

Attributes

Name Description
ids Return dictionary of parameters across results

Methods

Name Description
filter Return results matching particular ids
get Retrieve a single result from a filter operation
filter
samples.Dataset.filter(**kwargs)

Return results matching particular ids kwargs: key-value pairs, that should be present in Sample.id, to filter results on Can specify a list/set of values to match multiple items

get
samples.Dataset.get(**kwargs)

Retrieve a single result from a filter operation e.g. res = Dataset.get(scenario='foo') instead of res = Dataset.filter(scenario='foo')[0] instead of assuming that the arguments result in only 1 result being selected

Samples

samples.Samples(fname, memory_buffer=True, preload=False)

Stores CSV outputs and summary dataframes

To construct, use Samples.new(). To read an existing one, use Samples(fname). The sample files are just ZIP archives with plain text CSV and TXT files so they can be easily accessed externally as well.

Attributes

Name Description
columns Alias summary dataframe columns
id Return a dictionary with the identifiers and associated values
identifier Return tuple identifier for this run
index Alias summary dataframe index
seeds Return array of all seeds

Methods

Name Description
apply Apply/map function to every dataframe
copy Shallow copy - shared cache, copied summary
get Retrieve dataframe and summary row
items Iterate over seeds and dataframes
new
preload Load all dataframes into cache
apply
samples.Samples.apply(fcn, *args, **kwargs)

Apply/map function to every dataframe

The function will be applied to every individual dataframe in the collection.

Parameters
Name Type Description Default
fcn A function to apply. It should take in a dataframe required
args Additional arguments for fcn ()
kwargs Additional arguments for fcn {}

Returns: A list with the output of fcn

copy
samples.Samples.copy()

Shallow copy - shared cache, copied summary

This allows efficient filtering of seeds within runs by removing rows from the copy’s summary, while not reloading or duplicating any of the dataframes in memory

get
samples.Samples.get(seed)

Retrieve dataframe and summary row

Use Samples[seed] to read only the dataframe. Use Samples.get(seed) to read both the dataframe and summary row

items
samples.Samples.items()

Iterate over seeds and dataframes

Example usage

res = Samples(…) for seed, (row, df) in res: …

Tuple with
Name Type Description
- seed
- Samples.get(seed) i.e. a tuple with - the summary dataframe row for the requested seed - the corresponding CSV output for that run
new
samples.Samples.new(folder, outputs, identifiers=None, fname=None, verbose=True)
Parameters
Name Type Description Default
folder The folder name required
outputs A list of tuples (df:pd.DataFrame, summary_row:dict) where the summary row as an entry ‘seed’ for the seed required
identifiers A list of columns to use as identifiers. These should appear in the summary dataframe and should have the same value for all samples. This is useful when generating multiple sets of results e.g., for scenarios (optional) None
preload
samples.Samples.preload()

Load all dataframes into cache

This is done based on the seeds in self.seeds, therefore if some of the seeds are removed prior to preloading, then those dataframes will not be loaded