samples
Create a class for storing a large number of simulations.
Hierarchy - result: parameters, seed, beta - samples: collection of results with same parameters but different seeds - dataset: collection of samples (with different parameters)
Classes
| Name | Description |
|---|---|
| Samples | Stores CSV outputs and summary dataframes |
Samples
samples.Samples(fname, memory_buffer=True, preload=False)Stores CSV outputs and summary dataframes
To construct, use Samples.new(). To read an existing one, use Samples(fname). The sample files are just ZIP archives with plain text CSV and TXT files so they can be easily accessed externally as well.
Attributes
| Name | Description |
|---|---|
| columns | Alias summary dataframe columns |
| id | Return a dictionary with the identifiers and associated values |
| identifier | Return tuple identifier for this run |
| index | Alias summary dataframe index |
| seeds | Return array of all seeds |
Methods
| Name | Description |
|---|---|
| apply | Apply/map function to every dataframe |
| copy | Shallow copy - shared cache, copied summary |
| get | Retrieve dataframe and summary row |
| items | Iterate over seeds and dataframes |
| new | |
| preload | Load all dataframes into cache |
apply
samples.Samples.apply(fcn, *args, **kwargs)Apply/map function to every dataframe
The function will be applied to every individual dataframe in the collection.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| fcn | A function to apply. It should take in a dataframe | required | |
| args | Additional arguments for fcn |
() |
|
| kwargs | Additional arguments for fcn |
{} |
Returns: A list with the output of fcn
copy
samples.Samples.copy()Shallow copy - shared cache, copied summary
This allows efficient filtering of seeds within runs by removing rows from the copy’s summary, while not reloading or duplicating any of the dataframes in memory
get
samples.Samples.get(seed)Retrieve dataframe and summary row
Use Samples[seed] to read only the dataframe. Use Samples.get(seed) to read both the dataframe and summary row
items
samples.Samples.items()Iterate over seeds and dataframes
Example usage
res = Samples(…) for seed, (row, df) in res: …
Tuple with
| Name | Type | Description |
|---|---|---|
| - seed | ||
| - Samples.get(seed) i.e. a tuple with - the summary dataframe row for the requested seed - the corresponding CSV output for that run |
new
samples.Samples.new(folder, outputs, identifiers=None, fname=None, verbose=True)Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| folder | The folder name | required | |
| outputs | A list of tuples (df:pd.DataFrame, summary_row:dict) where the summary row as an entry ‘seed’ for the seed | required | |
| identifiers | A list of columns to use as identifiers. These should appear in the summary dataframe and should have the same value for all samples. This is useful when generating multiple sets of results e.g., for scenarios (optional) | None |
preload
samples.Samples.preload()Load all dataframes into cache
This is done based on the seeds in self.seeds, therefore if some of the seeds are removed prior to preloading, then those dataframes will not be loaded