Distributions

Agent-based models draw random numbers from distributions to simulate various phenomena. Realizations (also called draws or samples) from distributions are used to model everything from the duration of each infection to the number of contacts each individual has in a network to the outcomes of individual diagnostic tests. The Starsim framework includes state of the art functionality for specifying and sampling from statistical distributions.

In this guide, you will:

One key advantage of Starsim distributions is that they enable low-variance comparison between simulations through a technique called “common random numbers” (CRN). To learn more about CRN, see the Common Random Numbers guide.

Available distributions

The starsim framework includes a wide range of distributions that can be used to model various phenomena. The following distributions are available:

Distribution Starsim Dist Parameters Description
Random random None Random number between 0 and 1.
Uniform uniform low, high Random number between a specified minimum and maximum.
Normal normal loc (mean), scale (standard deviation) Random number from a normal (Gaussian) distribution with specified mean and standard deviation.
Lognormal (implicit) lognormal_im mean, sigma (of the underlying normal distribution) Random number from a lognormal distribution specified by the mean and standard deviation of the underlying normal distribution.
Lognormal (explicit) lognormal_ex mean, std (of the distribution itself; mean must be positive) Random number from a lognormal distribution specified by the mean and standard deviation of the distribution itself.
Exponential expon scale (mean of the distribution, equivalent to 1/λ) Random number from an exponential distribution with specified scale parameter.
Poisson poisson lam Random number from a Poisson distribution with specified rate parameter.
Negative Binomial nbinom n, p (number of successes, probability of success) Random number from a negative binomial distribution with specified parameters.
Weibull weibull c (shape parameter, sometimes called “k”), loc (shifts the distribution), scale (sometimes called “λ”) Random number from a Weibull distribution with specified parameters, uses scipy’s weibull_min.
Gamma gamma a (shape parameter, sometimes called “k”), loc (shifts the distribution), scale (sometimes called “θ”) Random number from a gamma distribution with specified shape and scale parameters.
Beta beta_dist a (shape parameter, must be > 0), b (shape parameter, must be > 0) Random number from a beta distribution with specified shape parameters.
Constant constant v Constant value, useful for fixed parameters and testing.
Random Integer randint low, high Random integer between a specified minimum and maximum.
Bernoulli bernoulli p Random number from a Bernoulli distribution with specified probability of success. Bernoulli distributions are used frequently to determine binary outcomes, such as whether an agent is infected or not.
Choice choice a (int or list of choices), p (probability of each choice) Randomly select from a list of values. This distribution only supports fixed parameters.
Histogram histogram values, bins, density, data Random number from a histogram distribution specified by a list of values and their corresponding probabilities.

See the API reference for more details.

You can also create a custom distribution by extending the Dist class from Starsim.

How to use Starsim distributions

Parameters for Starsim distributions can have fixed parameters (the same for all agents) or vary dynamically. We will start with a simple normal distribution with fixed parameters:

import starsim as ss
import numpy as np

# Create a Normal distribution with mean 0 and standard deviation 1
# The "strict" flag is needed only for this example to avoid warnings
d = ss.normal(name="Normal with fixed parameters", loc=0, scale=1, strict=False)
# ^^^ The above "d" object is the "frozen" distribution.

print(d)
ss.normal(loc=0, scale=1)

Drawing samples

Samples from a distribution can be drawn using the rvs method.

While possible to request a specific number of samples, as shown below, please know that this is not the preferred way to use distribution in Starsim:

draws = d.rvs(5)
print(draws)
[-0.98800114  1.40033713 -0.00723223 -2.0299323  -1.02095412]

The above approach is better than calling numpy.random directly, but still not ideal because it does not allow for dynamic parameter nor low-noise sampling. Therefore, please instead call the rvs method with the unique identifiers (uids) of the agents for which you need samples. Alternatively, you can pass a boolean mask of length equal to the number of agents in the simulation:

 # Create a simulation for context
 # [you will not need to do this when working within the framework]
sim = ss.Sim(n_agents=10).init()

# Specify the distribution, here a random integer between -10 and 10
d_sim = ss.randint(name="Random integer with fixed parameters", low=-10, high=10)
d_sim.init(sim=sim) # Initialize the distribution [done automatically in the framework]

# Instead of requesting 5 random numbers, the preferred pattern is to request
# random numbers for specific agents by passing a list of agent IDs. Doing so
# enable the powerful dynamic parameters to be used while also supporting the
# low-noise common random number sampling.
draws = d_sim.rvs([3, 5, 2, 9, 4]) # Draw samples for specific agents by UID
print(f"Draws for agents 3, 5, 2, 9, and 4: {draws}")

mask = sim.people.age < 25
draws_mask = d_sim.rvs(mask) # Draw samples for agents under 25 from a boolean mask
print(f"Draws for agents under 25: {draws_mask}")

draws_all = d_sim.rvs(sim.people.uid) # Draw samples for all agents
print(f"Draws for all agents (0, 1, 2, ..., n_agents): {draws_all}")
Initializing sim with 10 agents
Draws for agents 3, 5, 2, 9, and 4: [ 2 -9  1 -8  7]
Draws for agents under 25: [-10  -1   9]
Draws for all agents (0, 1, 2, ..., n_agents): [ 0  6  4 -9  3 -8  2  6 -9  8]

Visualizing distributions

Let’s take a look at the distribution by plotting a histogram of samples:

rvs = d.plot_hist(n=1000, bins=30)

Extracting distribution statistics

We can quickly calculate statistics of the distribution by accessing the underlying “dist” property of the distribution:

underlying_dist = d.dist
print("Mean:", underlying_dist.mean())
print("Standard Deviation:", underlying_dist.std())
print("95% Interval:", underlying_dist.ppf([0.025, 0.975]))
Mean: 0.0
Standard Deviation: 1.0
95% Interval: [-1.95996398  1.95996398]

Setting model parameters that are distributions

Many Starsim modules directly expose distributions as parameters for user customization. For example, the SIR disease module has three parameters are distributions:

Parameter Meaning Default Value
init_prev Initial prevalence of infection ss.bernoulli(p=0.01)
dur_inf Duration of infection ss.lognorm_ex(mean=ss.dur(6))
p_death Probability of death given infection ss.bernoulli(p=0.01)

You can change these parameters away from their defaults by passing in a different distribution, as demonstrated below:

sir = ss.SIR(
    # Set the init_prev parameter to randomly infect 15% of the population
    init_prev = ss.bernoulli(p=0.15),

    # Set the dur_inf parameter to a Weibull
    dur_inf = ss.weibull(c=2, loc=1, scale=2)
)

# Alternatively, update parameters of a default distribution
sir.pars.p_death.set(p=0.5) # Update the death probability to 50%

# Create, run, and plot a sim using the SIR disease model
sim = ss.Sim(n_agents=100, diseases=sir, dur=15, dt=1, start=0, networks=ss.RandomNet())
sim.run().plot()
Initializing sim with 100 agents
  Running 0.0 ( 0/16) (0.00 s)  •——————————————————— 6%
  Running 10.0 (10/16) (0.12 s)  •••••••••••••——————— 69%
Figure(768x576)

It’s easy to inspect distributions, for example dur_inf of the SIR module using plot_hist.

NOTE: in the code below that we access dur_inf at sim.diseases.sir.pars rather than at sir.pars (referencing the module in the previous cell) because Starsim makes a copy. The sir is not initialized and does not have the sim context.

rvs = sim.diseases.sir.pars.dur_inf.plot_hist()

Using distributions with dynamic parameters

Up to this point, the distributions we have used have had fixed parameters. Every agent draws will draw from the same distribution at every time step. But Starsim has a powerful feature to specify distributions with dynamic parameters that can change based on agent attributes, simulation time, or other factors.

Let’s continue with the SIR example, but make the initial prevalence of infection depend on the age of the agent. Instead of a fixed probability for all agents, let’s make it so that children under 15 years old have a 25% chance of being selected as a “seed” infection and adults over 15 years old have a 10% chance.

To implement dynamic parameters, we set the parameter (p in the case of a Bernoulli distribution) to a callable like a function. The callable must accept three arguments:

  1. self: The module that is calling the distribution
  2. sim: The simulation object
  3. uids: The unique IDs of the agents being processed

It should return a numpy array of the same length as uids containing the parameter value for each agent, or a scalar if the parameter is the same for all agents.

def set_p_by_age(self, sim, uids):
    p = np.full(len(uids), fill_value=0.10)  # Default 10% infection probability
    p[sim.people.age < 15] = 0.25  # 25% for ages < 15
    
    # For demonstration, print the age and probability for each agent:
    for uid, age, prob in zip(uids, sim.people.age[uids], p):
        print(f"Agent {uid} | Age: {age:5.2f} |  Infection Seed Probability: {prob:.0%}")
    return p

sir = ss.SIR(
    # Set init_prev as a dynamic parameter
    init_prev = ss.bernoulli(p=set_p_by_age),
)

# Create, run, and plot a sim using the SIR disease model
sim = ss.Sim(n_agents=10, diseases=[sir], dur=30)
sim.run()
Initializing sim with 10 agents
Agent 0 | Age: 25.15 |  Infection Seed Probability: 10%
Agent 1 | Age:  4.99 |  Infection Seed Probability: 25%
Agent 2 | Age: 58.15 |  Infection Seed Probability: 10%
Agent 3 | Age:  0.13 |  Infection Seed Probability: 25%
Agent 4 | Age: 43.82 |  Infection Seed Probability: 10%
Agent 5 | Age: 42.69 |  Infection Seed Probability: 10%
Agent 6 | Age: 54.29 |  Infection Seed Probability: 10%
Agent 7 | Age: 52.60 |  Infection Seed Probability: 10%
Agent 8 | Age:  7.08 |  Infection Seed Probability: 25%
Agent 9 | Age: 27.23 |  Infection Seed Probability: 10%
  Running 2000.01.01 ( 0/31) (0.00 s)  ———————————————————— 3%
  Running 2010.01.01 (10/31) (0.00 s)  •••••••————————————— 35%
  Running 2020.01.01 (20/31) (0.00 s)  •••••••••••••——————— 68%
  Running 2030.01.01 (30/31) (0.01 s)  •••••••••••••••••••• 100%
Sim(n=10; 2000—2030; diseases=sir)

Example Starsim intervention using dynamic distributions

To demonstrate the use and power of distributions in the Starsim framework, we next create an intervention that delivers a vaccine to a random subset of agents using a Bernoulli distribution. The default distribution will be fixed, the same for all agents, but we’ll make it a parameter to that the user can change it without having to modify the source code.

NOTE: That we create the distribution only one time in the __init__ method of the intervention. Here, the distribution is part of the parameters, so will appear at self.pars.p_vx. Alternatively, we could create a distribution between update_pars and return, as noted. Please avoid recreating a Starsim distribution on every time step.

class MyVx(ss.Intervention):
    def __init__(self, pars=None, **kwargs):
        super().__init__()
        self.define_states(
            ss.BoolState('vaccinated', label="Vaccinated", default=False)
        )
        self.define_pars(
            # Create a Bernoulli distribution as a module parameter
            # The value p=0.1 is a placeholder, the user will override
            p_vx = ss.bernoulli(p=0.1, name="Vaccination Probability")
        )
        self.update_pars(pars=pars, **kwargs)

        # NOTE, this is a great place to create other distributions, for example
        # self.my_dist = ss.normal(loc=0, scale=1)

        return

    def init_results(self):
        super().init_results()
        self.define_results(
            ss.Result('new_vx', dtype=int, label="Newly Vaccinated")
        )
        return

    def step(self):
        # Choose which agents to vaccinate
        novx = self.vaccinated == False # Boolean mask
        # Filter to select agents for which the Bernoulli sample is True
        vx_uids = self.pars.p_vx.filter(novx) # <-- can pass in mask or uids
        self.vaccinated[vx_uids] = True  # Set the state to vaccinated
        sim.diseases.sir.rel_sus[vx_uids] = 0.0  # Set susceptibility to 0 for vaccinated agents
        
        # Store the results
        self.results.new_vx[sim.ti] = len(vx_uids)

Now that we have created the intervention, we can configure it with a custom p_vx_func in “user space” and run the simulation.

def p_vx_func(self, sim, uids):
    # Set the probability of vaccination for each agent in uids
    # This is the "dynamic" callable for p_vx

    # Let's only administer the vaccine on the 5th time step
    if sim.ti != 5:
        return 0.0 

    # Set vaccination probability proportional to age
    p = sim.people.age[uids] / 100.0  # Normalize age to a probability between 0 and 1
    p = p.clip(0.0, 1.0)  # Ensure probabilities are between 0 and 1
    return p # p has the same length as uids


vx_intv = MyVx(p_vx = ss.bernoulli(p=p_vx_func))

# Create, run, and plot a sim using the SIR disease model
sim = ss.Sim(n_agents=1000, dur=10, dt=1, start=0,
    diseases=ss.SIR(), # Default SIR disease model
    interventions=[vx_intv],
    networks=ss.RandomNet()
)
sim.run().plot('myvx') # Verify vaccinations only on the 5th time step
Initializing sim with 1000 agents
  Running 0.0 ( 0/11) (0.00 s)  •——————————————————— 9%
  Running 10.0 (10/11) (0.02 s)  •••••••••••••••••••• 100%

Figure(768x576)

See how vaccines were delivered only on time step 5? If we were to look at which agents received the vaccine, we would see probability increasing with age.