People, States, and Arrays

Starsim is a framework for creating agent-based models, and the People class is where we store the agents, so it should come as no surprise that this class serves as the fundamental heart of any Starsim model. In this page we provide alternative pathways for creating people and some guidance on how to adapt these workflows depending on your needs.

We start by giving an overview on this page of Starsim’s custom Arr (array) classes, which are a separate but related Starsim class designed to neatly track data about people.

Starsim States and Arrays

Starsim has a set of custom array classes for recording information about each agent in the population. The two fundamental types of array for storing such infomation are the BoolState class, which is a Boolean array, and the FloatArr class, which stores numbers (we don’t distinguish between floats and integers, so all numbers are stored in these arrays). Each of these is a subclass of the Starsim Arr class.

The Arr class in Starsim is optimized for three key tasks that are common to almost all Starsim models:

  1. Dynamic growth: as the population grows over time, the size of the arrays dynamically update in a way that avoids costly concatenation operations;
  2. Indexing: over time, there are agents in the population who die. It is desirable for these agents to remain in the arrays so that we can continue to access data about them, but the indexing is set up so that dead agents are automatically excluded from most operations.
  3. Stochastic states: we often want to set the values of a state by sampling from a random variable (e.g. sex might be drawn as a Bernoulli random variable). Starsim’s Arr class can be initialized with a random variables; we will provide examples of this below.

All agents have a uid (universal identifier), which corresponds to their position in the array. Starsim keeps track of a list of auids (active UIDs), corresponding to agents who are alive or are otherwise participating in the simulation. This way, Starsim knows to skip over dead agents (or otherwise removed, e.g. from migration) when calculating disease progression, aging, etc.

In most cases, you shouldn’t need to worry about uids, auids, etc. However, this example illustrates how they work:

import sciris as sc
import starsim as ss

sim = ss.Sim(start=2000, stop=2020, n_agents=1000, diseases='sir', networks='random', demographics=True, verbose=False)
sim.init()

sc.heading('Initial state')
ppl = sim.people
print('Number of agents before run:', len(ppl))
print('Maximum UID:', ppl.uid.max())
print('Mean age:', ppl.age.mean())

sc.heading('After running the sim')
sim.run()
res = sim.results
print('Number of agents after run:', len(ppl))
print('Number of agents who were born:', sim.results.births.cumulative[-1])
print('Number of agents who died:', sim.results.cum_deaths[-1])
print('Maximum UID:', ppl.uid.max())
print('Size of the raw arrays:', len(ppl.uid.raw))
print('Mean age of alive agents:', ppl.age.mean())




—————————————

Initial state

—————————————



Number of agents before run: 1000

Maximum UID: 999

Mean age: 30.130148





—————————————————————

After running the sim

—————————————————————



Number of agents after run: 1208

Number of agents who were born: 447.0

Number of agents who died: 231.0

Maximum UID: 1446

Size of the raw arrays: 1500

Mean age of alive agents: 37.37983

Creating default people

When you create a sim, it automatically creates People, and you can use the n_agents argument to control the population size:

import numpy as np
import pandas as pd
import starsim as ss 
sim = ss.Sim(n_agents=1000)  # Create a sim with default people
sim.init()
Initializing sim with 1000 agents
Sim(n=1000; 2000—2050)

The People that are added to the Sim come with the following default states and arrays:

  • alive, a State that records whether each agent is alive
  • female, a State that records whether each agent is female
  • age, a FloatArr that records agent ages
  • ti_dead, a FloatArr that records the time of death, NaN by default
  • scale, a FloatArr that records the number of people that each agent represents; 1 by default.

Creating custom people

Rather than relying on the Sim to create people, you can create your own People and add them to the Sim as a separate argument. The example below is equivalent to the one immediately above:

people = ss.People(1000)
sim = ss.Sim(people=people)

The main reason to create custom people is if you want to specify a particular age/sex distribution. The following example creates a population with the age distribution of Nigeria:

age_data = pd.read_csv('test_data/nigeria_age.csv')
ppl = ss.People(n_agents=10e3, age_data=age_data)
sim = ss.Sim(people=ppl, copy_inputs=False).init()
ppl.plot_ages();
Initializing sim with 10000 agents
Figure(672x480)

Another reason to create custom people is if there are additional attributes that you want to track. Let’s say we want to add a state to track urban/rural status. The example below also illustrates how you can add a stochastic state whose values are sampled from a distribution.

def urban_function(n):
    """ Make a function to randomly assign people to urban/rural locations """ 
    return np.random.choice(a=[True, False], p=[0.5, 0.5], size=n)

urban = ss.BoolState('urban', default=urban_function)
ppl = ss.People(10, extra_states=urban)  # Create 10 people with this state
sim = ss.Sim(people=ppl)
sim.init()  # Initialize the sim --> essential step to create the people and sample states
print(f'Number of urban people: {np.count_nonzero(sim.people.urban)}')
Initializing sim with 10 agents
Number of urban people: 8

Modifying People with modules

We saw an example above of adding a custom state to people. However, a far more common way to add states to people is by adding a module to the Sim. All the states of the modules will automatically get added to the main People instance.

ppl = ss.People(30)
sim = ss.Sim(people=ppl, diseases=ss.SIS(init_prev=0.1), networks=ss.RandomNet())
sim.run()
print(f'Number of infected people: {sim.people.sis.infected.sum()}')
Initializing sim with 30 agents
  Running 2000.01.01 ( 0/51) (0.00 s)  ———————————————————— 2%
  Running 2010.01.01 (10/51) (0.01 s)  ••••———————————————— 22%
  Running 2020.01.01 (20/51) (0.02 s)  ••••••••———————————— 41%
  Running 2030.01.01 (30/51) (0.03 s)  ••••••••••••———————— 61%
  Running 2040.01.01 (40/51) (0.05 s)  ••••••••••••••••———— 80%
  Running 2050.01.01 (50/51) (0.06 s)  •••••••••••••••••••• 100%

Number of infected people: 15

When states or arrays are added by modules, they are stored as dictionaries under the name of that module.

Note that the Starsim Arr class can be used like a Numpy array, with all the standard arithmetic operations like sums, mean, counting, etc.

Debugging and analyzing

There are several ways to explore the People object. One way is by exporting to a dataframe:

df = sim.people.to_df()
df.disp()
    uid  slot  alive  female      age  ti_dead  scale  randomnet.participant  sis.susceptible  sis.infected  sis.rel_sus  sis.rel_trans  sis.ti_infected  sis.ti_recovered  sis.immunity
0     0     0   True   False  25.1524      NaN    1.0                  False            False          True       0.0000            1.0             46.0           56.1627        1.5163
1     1     1   True    True   4.9883      NaN    1.0                  False            False          True       0.2294            1.0             42.0           52.0214        0.7706
2     2     2   True   False  58.1494      NaN    1.0                  False            False          True       0.0863            1.0             42.0           52.7087        0.9137
3     3     3   True   False   0.1302      NaN    1.0                  False            False          True       0.0000            1.0             49.0           59.3261        1.4532
4     4     4   True   False  43.8199      NaN    1.0                  False            False          True       0.0000            1.0             43.0           53.9736        1.0737
5     5     5   True   False  42.6909      NaN    1.0                  False             True         False       0.6862            1.0             20.0           29.6344        0.3138
6     6     6   True    True  54.2923      NaN    1.0                  False             True         False       0.2953            1.0             37.0           47.1078        0.7047
7     7     7   True   False  52.6040      NaN    1.0                  False             True         False       0.6314            1.0             25.0           34.3523        0.3686
8     8     8   True   False   7.0769      NaN    1.0                  False            False          True       0.0251            1.0             41.0           51.3925        0.9749
9     9     9   True   False  27.2292      NaN    1.0                  False             True         False       0.1064            1.0             39.0           49.1210        0.8936
10   10    10   True    True   4.0500      NaN    1.0                  False             True         False       0.1828            1.0             34.0           45.8417        0.8172
11   11    11   True   False  42.6985      NaN    1.0                  False            False          True       0.0000            1.0             49.0           57.8087        1.4802
12   12    12   True    True  25.2828      NaN    1.0                  False            False          True       0.0000            1.0             48.0           57.5599        1.4279
13   13    13   True    True  27.6266      NaN    1.0                  False             True         False       0.3926            1.0             36.0           44.9499        0.6074
14   14    14   True   False  24.6072      NaN    1.0                  False            False          True       0.0362            1.0             42.0           51.6577        0.9638
15   15    15   True   False  48.8226      NaN    1.0                  False             True         False       0.1770            1.0             35.0           43.8462        0.8230
16   16    16   True   False  16.9486      NaN    1.0                  False             True         False       0.5447            1.0             28.0           37.6368        0.4553
17   17    17   True    True  19.6193      NaN    1.0                  False             True         False       0.5025            1.0             31.0           40.2820        0.4975
18   18    18   True    True  40.8453      NaN    1.0                  False            False          True       0.0000            1.0             43.0           51.2966        1.1735
19   19    19   True   False  25.0690      NaN    1.0                  False             True         False       0.4052            1.0             35.0           47.0952        0.5948
20   20    20   True   False  24.1976      NaN    1.0                  False            False          True       0.0000            1.0             41.0           50.5722        1.1485
21   21    21   True   False  27.8875      NaN    1.0                  False            False          True       0.5276            1.0             50.0           61.8450        1.4724
22   22    22   True    True   7.1277      NaN    1.0                  False             True         False       0.5071            1.0             32.0           42.5898        0.4929
23   23    23   True   False  35.5697      NaN    1.0                  False             True         False       0.5034            1.0             30.0           40.4424        0.4966
24   24    24   True    True  36.3433      NaN    1.0                  False             True         False       0.3615            1.0             37.0           46.3120        0.6385
25   25    25   True   False  58.3792      NaN    1.0                  False             True         False       0.5156            1.0             30.0           39.2283        0.4844
26   26    26   True    True   1.6279      NaN    1.0                  False             True         False       0.6753            1.0             22.0           31.9325        0.3247
27   27    27   True   False  49.9440      NaN    1.0                  False            False          True       0.0966            1.0             50.0           60.1813        1.9034
28   28    28   True   False  50.4779      NaN    1.0                  False            False          True       0.0000            1.0             40.0           51.4462        1.1485
29   29    29   True   False  39.0536      NaN    1.0                  False            False          True       0.2576            1.0             50.0           59.1780        1.7424

This is usually too much information to understand directly, but can be useful for producing summary statistics; for example, let’s say we want to understand the relationship between time since recovery and immunity:

import matplotlib.pyplot as plt
plt.scatter(df['sis.ti_recovered'], df['sis.immunity'])
plt.xlabel('Time of recovery')
plt.ylabel('Immunity')
plt.show()

Sometimes we want to explore a single agent in more detail. For this, there is a person() method, which will return all the attributes of that particular agent (equivalent to a single row in the dataframe):

sim.people.person(10)
#0. 'uid':                   10
#1. 'slot':                  10
#2. 'alive':                 True
#3. 'female':                True
#4. 'age':                   4.050025
#5. 'ti_dead':               nan
#6. 'scale':                 1.0
#7. 'randomnet.participant': False
#8. 'sis.susceptible':       True
#9. 'sis.infected':          False
#10. 'sis.rel_sus':           0.18277001
#11. 'sis.rel_trans':         1.0
#12. 'sis.ti_infected':       34.0
#13. 'sis.ti_recovered':      45.841686
#14. 'sis.immunity':          0.81723