Multi-Agent Environments

Two multiagent environments are given in the package:

The latter is preferable for multi-agent RL (MARL) settings, as most algorithms are designed for this kind of API.

Configuring the Environment

For this example, a multisatellite target imaging environment will be used. The goal is to maximize the value of unique images taken.

As usual, the satellite type is defined first.

[1]:
from bsk_rl import sats, act, obs, scene, data, comm
from bsk_rl.sim import dyn, fsw

class ImagingSatellite(sats.ImagingSatellite):
    observation_spec = [
        obs.OpportunityProperties(
            dict(prop="priority"),
            dict(prop="opportunity_open", norm=5700.0),
            n_ahead_observe=10,
        )
    ]
    action_spec = [act.Image(n_ahead_image=10)]
    dyn_type = dyn.FullFeaturedDynModel
    fsw_type = fsw.SteeringImagerFSWModel

Satellite properties are set to give the satellite near-unlimited power and storage resources. To randomize some parameters in a correlated manner across satellites, a sat_arg_randomizer is set and passed to the environment. In this case, the satellites are distributed in a trivial single-plane Walker-delta constellation.

[2]:

from bsk_rl.utils.orbital import walker_delta_args sat_args = dict( imageAttErrorRequirement=0.01, imageRateErrorRequirement=0.01, batteryStorageCapacity=1e9, storedCharge_Init=1e9, dataStorageCapacity=1e12, u_max=0.4, K1=0.25, K3=3.0, omega_max=0.087, servo_Ki=5.0, servo_P=150 / 5, ) sat_arg_randomizer = walker_delta_args(altitude=800.0, inc=60.0, n_planes=1)

Gym API

GeneralSatelliteTasking uses tuples of actions and observations to interact with the environment.

[3]:
from bsk_rl import GeneralSatelliteTasking

env = GeneralSatelliteTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_space
2026-03-09 19:08:46,205 gym                            INFO       Resetting environment with seed=396220437
2026-03-09 19:08:46,207 scene.targets                  INFO       Generating 1000 targets
2026-03-09 19:08:46,353 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2026-03-09 19:08:46,401 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2026-03-09 19:08:46,453 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2026-03-09 19:08:46,490 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2026-03-09 19:08:46,526 gym                            INFO       <0.00> Environment reset
[3]:
Tuple(Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64))
[4]:
env.action_space
[4]:
Tuple(Discrete(10), Discrete(10), Discrete(10))

Consequently, actions are passed as a tuple. The step will stop the first time any satellite completes an action.

[5]:
observation, reward, terminated, truncated, info = env.step([7, 9, 8])
2026-03-09 19:08:46,541 gym                            INFO       <0.00> === STARTING STEP ===
2026-03-09 19:08:46,542 sats.satellite.EO-1            INFO       <0.00> EO-1: target index 7 tasked
2026-03-09 19:08:46,542 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-818) tasked for imaging
2026-03-09 19:08:46,544 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-818) window enabled: 124.2 to 311.4
2026-03-09 19:08:46,544 sats.satellite.EO-1            INFO       <0.00> EO-1: setting timed terminal event at 311.4
2026-03-09 19:08:46,545 sats.satellite.EO-2            INFO       <0.00> EO-2: target index 9 tasked
2026-03-09 19:08:46,546 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-102) tasked for imaging
2026-03-09 19:08:46,546 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-102) window enabled: 149.5 to 287.7
2026-03-09 19:08:46,547 sats.satellite.EO-2            INFO       <0.00> EO-2: setting timed terminal event at 287.7
2026-03-09 19:08:46,548 sats.satellite.EO-3            INFO       <0.00> EO-3: target index 8 tasked
2026-03-09 19:08:46,548 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-75) tasked for imaging
2026-03-09 19:08:46,549 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-75) window enabled: 848.5 to 955.1
2026-03-09 19:08:46,550 sats.satellite.EO-3            INFO       <0.00> EO-3: setting timed terminal event at 955.1
2026-03-09 19:08:46,579 sats.satellite.EO-1            INFO       <127.00> EO-1: imaged Target(tgt-818)
2026-03-09 19:08:46,581 data.base                      INFO       <127.00> Total reward: {'EO-1': 0.5911708492407011}
2026-03-09 19:08:46,581 sats.satellite.EO-1            INFO       <127.00> EO-1: Satellite EO-1 requires retasking
2026-03-09 19:08:46,584 gym                            INFO       <127.00> Step reward: 0.5911708492407011
[6]:
observation
[6]:
(array([ 0.78650977, -0.0222807 ,  0.36750812, -0.02047614,  0.48076925,
        -0.0222807 ,  0.12386798, -0.01955821,  0.5111047 , -0.01542022,
         0.64942808, -0.00320596,  0.61203583,  0.00182244,  0.99497589,
         0.00358436,  0.94548049,  0.01293856,  0.38849792,  0.01356003]),
 array([ 9.74678829e-01, -2.22807018e-02,  4.21448073e-01, -2.11792932e-02,
         7.23816195e-01, -1.35553283e-02,  8.54465048e-01, -2.22807018e-02,
         9.91939773e-01, -2.20476321e-02,  8.29827414e-01, -1.60254113e-02,
         9.75289767e-01,  3.94484120e-03,  2.47836681e-01, -8.06938764e-04,
         5.53209004e-01, -4.01404104e-03,  2.06468680e-01,  2.81811644e-02]),
 array([ 2.98858508e-01, -4.23895938e-03,  9.37190943e-01, -3.85381887e-03,
         8.99199884e-01, -2.32804594e-04,  2.74219856e-01,  3.66367613e-02,
         2.31353123e-01,  3.27406458e-02,  6.58326131e-01,  3.23175947e-02,
         4.36117012e-01,  7.18846739e-02,  4.23358909e-01,  6.97674731e-02,
         6.57912499e-01,  1.26585252e-01,  5.08012993e-01,  1.39898055e-01]))

At this point, either every satellite can be retasked, or satellites can continue their previous action by passing None as the action. To see which satellites must be retasked (i.e. their previous action is done and they have nothing more to do), look at "requires_retasking" in each satellite’s info.

[7]:
info
[7]:
{'EO-1': {'requires_retasking': True},
 'EO-2': {'requires_retasking': False},
 'EO-3': {'requires_retasking': False},
 'd_ts': 127.00000000000001}

Based on this list, we decide here to only retask the satellite that needs it.

[8]:
actions = [0 if info[sat.name]["requires_retasking"] else None for sat in env.unwrapped.satellites]
actions
[8]:
[0, None, None]
[9]:
observation, reward, terminated, truncated, info = env.step(actions)
2026-03-09 19:08:46,607 gym                            INFO       <127.00> === STARTING STEP ===
2026-03-09 19:08:46,607 sats.satellite.EO-1            INFO       <127.00> EO-1: target index 0 tasked
2026-03-09 19:08:46,608 sats.satellite.EO-1            INFO       <127.00> EO-1: Target(tgt-294) tasked for imaging
2026-03-09 19:08:46,609 sats.satellite.EO-1            INFO       <127.00> EO-1: Target(tgt-294) window enabled: 0.0 to 180.9
2026-03-09 19:08:46,610 sats.satellite.EO-1            INFO       <127.00> EO-1: setting timed terminal event at 180.9
2026-03-09 19:08:46,618 sats.satellite.EO-2            INFO       <152.00> EO-2: imaged Target(tgt-102)
2026-03-09 19:08:46,618 data.base                      INFO       <152.00> Total reward: {'EO-2': 0.9752897669006937}
2026-03-09 19:08:46,619 sats.satellite.EO-2            INFO       <152.00> EO-2: Satellite EO-2 requires retasking
2026-03-09 19:08:46,622 gym                            INFO       <152.00> Step reward: 0.9752897669006937

In this environment, the environment will stop if any agent dies. To demonstrate this, one satellite is forcibly killed.

[10]:
from Basilisk.architecture import messaging

def isnt_alive(log_failure=False):
    """Mock satellite 0 dying."""
    self = env.unwrapped.satellites[0]
    death_message = messaging.PowerStorageStatusMsgPayload()
    death_message.storageLevel = 0.0
    self.dynamics.powerMonitor.batPowerOutMsg.write(death_message)
    return self.dynamics.is_alive(log_failure=log_failure) and self.fsw.is_alive(
        log_failure=log_failure
    )

env.unwrapped.satellites[0].is_alive = isnt_alive
observation, reward, terminated, truncated, info = env.step([6, 7, 9])

2026-03-09 19:08:46,629 gym                            INFO       <152.00> === STARTING STEP ===
2026-03-09 19:08:46,629 sats.satellite.EO-1            INFO       <152.00> EO-1: target index 6 tasked
2026-03-09 19:08:46,630 sats.satellite.EO-1            INFO       <152.00> EO-1: Target(tgt-759) tasked for imaging
2026-03-09 19:08:46,631 sats.satellite.EO-1            INFO       <152.00> EO-1: Target(tgt-759) window enabled: 137.4 to 318.1
2026-03-09 19:08:46,631 sats.satellite.EO-1            INFO       <152.00> EO-1: setting timed terminal event at 318.1
2026-03-09 19:08:46,632 sats.satellite.EO-2            INFO       <152.00> EO-2: target index 7 tasked
2026-03-09 19:08:46,633 sats.satellite.EO-2            INFO       <152.00> EO-2: Target(tgt-972) tasked for imaging
2026-03-09 19:08:46,633 sats.satellite.EO-2            INFO       <152.00> EO-2: Target(tgt-972) window enabled: 287.6 to 401.7
2026-03-09 19:08:46,634 sats.satellite.EO-2            INFO       <152.00> EO-2: setting timed terminal event at 401.7
2026-03-09 19:08:46,635 sats.satellite.EO-3            INFO       <152.00> EO-3: target index 9 tasked
2026-03-09 19:08:46,635 sats.satellite.EO-3            INFO       <152.00> EO-3: Target(tgt-73) tasked for imaging
2026-03-09 19:08:46,636 sats.satellite.EO-3            INFO       <152.00> EO-3: Target(tgt-73) window enabled: 924.4 to 1010.0
2026-03-09 19:08:46,637 sats.satellite.EO-3            INFO       <152.00> EO-3: setting timed terminal event at 1010.0
2026-03-09 19:08:46,650 sats.satellite.EO-1            INFO       <204.00> EO-1: imaged Target(tgt-759)
2026-03-09 19:08:46,651 data.base                      INFO       <204.00> Total reward: {'EO-1': 0.6120358289331573}
2026-03-09 19:08:46,652 sats.satellite.EO-1            INFO       <204.00> EO-1: Satellite EO-1 requires retasking
2026-03-09 19:08:46,653 sats.satellite.EO-1            WARNING    <204.00> EO-1: failed battery_valid check
2026-03-09 19:08:46,655 gym                            INFO       <204.00> Step reward: -0.38796417106684267
2026-03-09 19:08:46,656 gym                            INFO       <204.00> Episode terminated: True
2026-03-09 19:08:46,656 gym                            INFO       <204.00> Episode truncated: False

PettingZoo API

The PettingZoo parallel API environment, ConstellationTasking, is largely the same as GeneralSatelliteTasking. See their documentation for a full description of the API. It tends to separate things into dictionaries keyed by agent, rather than tuples.

[11]:
from bsk_rl import ConstellationTasking

env = ConstellationTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_spaces
2026-03-09 19:08:46,664                                WARNING    Creating logger for new env on PID=4038. Old environments in process may now log times incorrectly.
2026-03-09 19:08:46,666 gym                            INFO       Resetting environment with seed=858142052
2026-03-09 19:08:46,669 scene.targets                  INFO       Generating 1000 targets
2026-03-09 19:08:46,806 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2026-03-09 19:08:46,910 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2026-03-09 19:08:46,954 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2026-03-09 19:08:46,994 gym                            INFO       <0.00> Environment reset
[11]:
{'EO-1': Box(-1e+16, 1e+16, (20,), float64),
 'EO-2': Box(-1e+16, 1e+16, (20,), float64),
 'EO-3': Box(-1e+16, 1e+16, (20,), float64)}
[12]:
env.action_spaces
[12]:
{'EO-1': Discrete(10), 'EO-2': Discrete(10), 'EO-3': Discrete(10)}

Actions are passed as a dictionary; the agent names can be accessed through the agents property.

[13]:
observation, reward, terminated, truncated, info = env.step(
    {
        env.agents[0]: 7,
        env.agents[1]: 9,
        env.agents[2]: 8,
    }
)
2026-03-09 19:08:47,007 gym                            INFO       <0.00> === STARTING STEP ===
2026-03-09 19:08:47,008 sats.satellite.EO-1            INFO       <0.00> EO-1: target index 7 tasked
2026-03-09 19:08:47,009 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-221) tasked for imaging
2026-03-09 19:08:47,010 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-221) window enabled: 185.6 to 337.0
2026-03-09 19:08:47,010 sats.satellite.EO-1            INFO       <0.00> EO-1: setting timed terminal event at 337.0
2026-03-09 19:08:47,011 sats.satellite.EO-2            INFO       <0.00> EO-2: target index 9 tasked
2026-03-09 19:08:47,012 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-203) tasked for imaging
2026-03-09 19:08:47,014 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-203) window enabled: 283.9 to 482.6
2026-03-09 19:08:47,015 sats.satellite.EO-2            INFO       <0.00> EO-2: setting timed terminal event at 482.6
2026-03-09 19:08:47,016 sats.satellite.EO-3            INFO       <0.00> EO-3: target index 8 tasked
2026-03-09 19:08:47,016 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-526) tasked for imaging
2026-03-09 19:08:47,017 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-526) window enabled: 328.6 to 512.3
2026-03-09 19:08:47,018 sats.satellite.EO-3            INFO       <0.00> EO-3: setting timed terminal event at 512.3
2026-03-09 19:08:47,067 sats.satellite.EO-1            INFO       <188.00> EO-1: imaged Target(tgt-221)
2026-03-09 19:08:47,068 data.base                      INFO       <188.00> Total reward: {'EO-1': 0.5956082895517771}
2026-03-09 19:08:47,069 sats.satellite.EO-1            INFO       <188.00> EO-1: Satellite EO-1 requires retasking
2026-03-09 19:08:47,072 gym                            INFO       <188.00> Step reward: {'EO-1': 0.5956082895517771}
[14]:
observation
[14]:
{'EO-1': array([ 5.50535475e-01, -1.54573326e-02,  4.76373108e-01, -1.22177861e-02,
         1.88364939e-01,  6.34322392e-05,  9.77546006e-01, -2.13786011e-02,
         1.07427708e-01, -1.07823424e-02,  7.73036807e-01,  8.96859992e-03,
         8.98112894e-01,  1.97772249e-03,  4.28811741e-01,  1.21579538e-05,
         3.96624208e-02,  7.64043476e-03,  5.22807126e-01,  2.18377739e-02]),
 'EO-2': array([ 2.58725685e-01, -2.36061394e-02,  2.79980524e-01, -2.27933909e-02,
         2.48400212e-01, -7.11456757e-04,  1.63505452e-01, -7.85512958e-03,
         9.97022529e-01,  7.35954792e-03,  3.38430876e-01,  1.68285034e-02,
         3.09517371e-01,  1.66848800e-02,  5.81213840e-01,  2.38637998e-02,
         9.89253466e-01,  2.37961357e-02,  6.82803509e-01,  3.51416112e-02]),
 'EO-3': array([ 0.65409521, -0.02867663,  0.60772544,  0.01543238,  0.7816903 ,
        -0.01060059,  0.8094638 , -0.00169946,  0.76970381,  0.02466534,
         0.78292738,  0.03485895,  0.74102032,  0.04361453,  0.96436136,
         0.04767929,  0.13861159,  0.06276282,  0.19372134,  0.06579768])}

Other than compatibility with MARL algorithms, the main benefit of the PettingZoo API is that it allows for individual agents to fail without terminating the entire environment.

[15]:
# Immediately kill satellite 0
env.unwrapped.satellites[0].is_alive = isnt_alive
env.agents
[15]:
['EO-1', 'EO-2', 'EO-3']
[16]:
observation, reward, terminated, truncated, info = env.step({
        env.agents[0]: 7,
        env.agents[1]: 9,
    }
)
2026-03-09 19:08:47,089 gym                            INFO       <188.00> === STARTING STEP ===
2026-03-09 19:08:47,090 sats.satellite.EO-1            INFO       <188.00> EO-1: target index 7 tasked
2026-03-09 19:08:47,090 sats.satellite.EO-1            INFO       <188.00> EO-1: Target(tgt-170) tasked for imaging
2026-03-09 19:08:47,091 sats.satellite.EO-1            INFO       <188.00> EO-1: Target(tgt-170) window enabled: 188.1 to 392.1
2026-03-09 19:08:47,092 sats.satellite.EO-1            INFO       <188.00> EO-1: setting timed terminal event at 392.1
2026-03-09 19:08:47,092 sats.satellite.EO-2            INFO       <188.00> EO-2: target index 9 tasked
2026-03-09 19:08:47,093 sats.satellite.EO-2            INFO       <188.00> EO-2: Target(tgt-532) tasked for imaging
2026-03-09 19:08:47,094 sats.satellite.EO-2            INFO       <188.00> EO-2: Target(tgt-532) window enabled: 388.3 to 556.6
2026-03-09 19:08:47,095 sats.satellite.EO-2            INFO       <188.00> EO-2: setting timed terminal event at 556.6
2026-03-09 19:08:47,105 sats.satellite.EO-1            INFO       <226.00> EO-1: imaged Target(tgt-170)
2026-03-09 19:08:47,105 data.base                      INFO       <226.00> Total reward: {'EO-1': 0.4288117412894884}
2026-03-09 19:08:47,106 sats.satellite.EO-1            INFO       <226.00> EO-1: Satellite EO-1 requires retasking
2026-03-09 19:08:47,110 gym                            INFO       <226.00> Step reward: {'EO-1': -0.5711882587105116}
2026-03-09 19:08:47,110 gym                            INFO       <226.00> Episode terminated: ['EO-1']