Multi-Agent Environments

Two multiagent environments are given in the package:

The latter is preferable for multi-agent RL (MARL) settings, as most algorithms are designed for this kind of API.

Configuring the Environment

For this example, a multisatellite target imaging environment will be used. The goal is to maximize the value of unique images taken.

As usual, the satellite type is defined first.

[1]:
from bsk_rl import sats, act, obs, scene, data, comm
from bsk_rl.sim import dyn, fsw

class ImagingSatellite(sats.ImagingSatellite):
    observation_spec = [
        obs.OpportunityProperties(
            dict(prop="priority"),
            dict(prop="opportunity_open", norm=5700.0),
            n_ahead_observe=10,
        )
    ]
    action_spec = [act.Image(n_ahead_image=10)]
    dyn_type = dyn.FullFeaturedDynModel
    fsw_type = fsw.SteeringImagerFSWModel

Satellite properties are set to give the satellite near-unlimited power and storage resources. To randomize some parameters in a correlated manner across satellites, a sat_arg_randomizer is set and passed to the environment. In this case, the satellites are distributed in a trivial single-plane Walker-delta constellation.

[2]:

from bsk_rl.utils.orbital import walker_delta_args sat_args = dict( imageAttErrorRequirement=0.01, imageRateErrorRequirement=0.01, batteryStorageCapacity=1e9, storedCharge_Init=1e9, dataStorageCapacity=1e12, u_max=0.4, K1=0.25, K3=3.0, omega_max=0.087, servo_Ki=5.0, servo_P=150 / 5, ) sat_arg_randomizer = walker_delta_args(altitude=800.0, inc=60.0, n_planes=1)

Gym API

GeneralSatelliteTasking uses tuples of actions and observations to interact with the environment.

[3]:
from bsk_rl import GeneralSatelliteTasking

env = GeneralSatelliteTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_space
2026-02-03 17:26:22,926 gym                            INFO       Resetting environment with seed=1932730903
2026-02-03 17:26:22,929 scene.targets                  INFO       Generating 1000 targets
2026-02-03 17:26:23,078 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2026-02-03 17:26:23,121 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2026-02-03 17:26:23,168 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2026-02-03 17:26:23,209 gym                            INFO       <0.00> Environment reset
[3]:
Tuple(Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64))
[4]:
env.action_space
[4]:
Tuple(Discrete(10), Discrete(10), Discrete(10))

Consequently, actions are passed as a tuple. The step will stop the first time any satellite completes an action.

[5]:
observation, reward, terminated, truncated, info = env.step([7, 9, 8])
2026-02-03 17:26:23,225 gym                            INFO       <0.00> === STARTING STEP ===
2026-02-03 17:26:23,225 sats.satellite.EO-1            INFO       <0.00> EO-1: target index 7 tasked
2026-02-03 17:26:23,226 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-613) tasked for imaging
2026-02-03 17:26:23,227 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-613) window enabled: 222.6 to 420.3
2026-02-03 17:26:23,228 sats.satellite.EO-1            INFO       <0.00> EO-1: setting timed terminal event at 420.3
2026-02-03 17:26:23,230 sats.satellite.EO-2            INFO       <0.00> EO-2: target index 9 tasked
2026-02-03 17:26:23,230 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-74) tasked for imaging
2026-02-03 17:26:23,231 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-74) window enabled: 149.5 to 358.7
2026-02-03 17:26:23,231 sats.satellite.EO-2            INFO       <0.00> EO-2: setting timed terminal event at 358.7
2026-02-03 17:26:23,232 sats.satellite.EO-3            INFO       <0.00> EO-3: target index 8 tasked
2026-02-03 17:26:23,233 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-141) tasked for imaging
2026-02-03 17:26:23,235 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-141) window enabled: 338.8 to 544.7
2026-02-03 17:26:23,235 sats.satellite.EO-3            INFO       <0.00> EO-3: setting timed terminal event at 544.7
2026-02-03 17:26:23,275 sats.satellite.EO-2            INFO       <152.00> EO-2: imaged Target(tgt-74)
2026-02-03 17:26:23,277 data.base                      INFO       <152.00> Total reward: {'EO-2': 0.19758842018565925}
2026-02-03 17:26:23,277 sats.satellite.EO-2            INFO       <152.00> EO-2: Satellite EO-2 requires retasking
2026-02-03 17:26:23,280 gym                            INFO       <152.00> Step reward: 0.19758842018565925
[6]:
observation
[6]:
(array([ 3.99709156e-01, -1.36914678e-03,  7.58172658e-01, -1.10367435e-02,
         4.82002916e-02, -1.19071323e-02,  2.28229424e-01, -6.18822827e-03,
         4.75299342e-01, -1.17282124e-04,  6.41514382e-01,  1.23917573e-02,
         4.14474675e-01,  3.55804860e-02,  5.11330962e-01,  4.49319223e-02,
         2.55815314e-01,  3.90886797e-02,  5.73726836e-02,  4.74696052e-02]),
 array([ 0.5632466 , -0.01982393,  0.65261174, -0.02285464,  0.21121855,
        -0.01703963,  0.06903315, -0.01060437,  0.53333908,  0.00211502,
         0.96059295,  0.03054874,  0.03467488,  0.03956484,  0.99056449,
         0.0609207 ,  0.13464547,  0.0605077 ,  0.57038472,  0.06074112]),
 array([ 0.44977675, -0.02607011,  0.39811349,  0.00757611,  0.91335633,
         0.0152419 ,  0.55782266,  0.02050291,  0.56378368,  0.02884206,
         0.08986576,  0.03277846,  0.05574087,  0.07781134,  0.70437044,
         0.05215251,  0.1074471 ,  0.07123636,  0.57515036,  0.04845999]))

At this point, either every satellite can be retasked, or satellites can continue their previous action by passing None as the action. To see which satellites must be retasked (i.e. their previous action is done and they have nothing more to do), look at "requires_retasking" in each satellite’s info.

[7]:
info
[7]:
{'EO-1': {'requires_retasking': False},
 'EO-2': {'requires_retasking': True},
 'EO-3': {'requires_retasking': False},
 'd_ts': 152.0}

Based on this list, we decide here to only retask the satellite that needs it.

[8]:
actions = [0 if info[sat.name]["requires_retasking"] else None for sat in env.unwrapped.satellites]
actions
[8]:
[None, 0, None]
[9]:
observation, reward, terminated, truncated, info = env.step(actions)
2026-02-03 17:26:23,302 gym                            INFO       <152.00> === STARTING STEP ===
2026-02-03 17:26:23,303 sats.satellite.EO-2            INFO       <152.00> EO-2: target index 0 tasked
2026-02-03 17:26:23,304 sats.satellite.EO-2            INFO       <152.00> EO-2: Target(tgt-780) tasked for imaging
2026-02-03 17:26:23,305 sats.satellite.EO-2            INFO       <152.00> EO-2: Target(tgt-780) window enabled: 39.0 to 179.4
2026-02-03 17:26:23,306 sats.satellite.EO-2            INFO       <152.00> EO-2: setting timed terminal event at 179.4
2026-02-03 17:26:23,314 sats.satellite.EO-2            INFO       <180.00> EO-2: timed termination at 179.4 for Target(tgt-780) window
2026-02-03 17:26:23,315 data.base                      INFO       <180.00> Total reward: {}
2026-02-03 17:26:23,315 sats.satellite.EO-2            INFO       <180.00> EO-2: Satellite EO-2 requires retasking
2026-02-03 17:26:23,319 gym                            INFO       <180.00> Step reward: 0.0

In this environment, the environment will stop if any agent dies. To demonstrate this, one satellite is forcibly killed.

[10]:
from Basilisk.architecture import messaging

def isnt_alive(log_failure=False):
    """Mock satellite 0 dying."""
    self = env.unwrapped.satellites[0]
    death_message = messaging.PowerStorageStatusMsgPayload()
    death_message.storageLevel = 0.0
    self.dynamics.powerMonitor.batPowerOutMsg.write(death_message)
    return self.dynamics.is_alive(log_failure=log_failure) and self.fsw.is_alive(
        log_failure=log_failure
    )

env.unwrapped.satellites[0].is_alive = isnt_alive
observation, reward, terminated, truncated, info = env.step([6, 7, 9])

2026-02-03 17:26:23,325 gym                            INFO       <180.00> === STARTING STEP ===
2026-02-03 17:26:23,326 sats.satellite.EO-1            INFO       <180.00> EO-1: target index 6 tasked
2026-02-03 17:26:23,326 sats.satellite.EO-1            INFO       <180.00> EO-1: Target(tgt-462) tasked for imaging
2026-02-03 17:26:23,328 sats.satellite.EO-1            INFO       <180.00> EO-1: Target(tgt-462) window enabled: 354.8 to 524.3
2026-02-03 17:26:23,328 sats.satellite.EO-1            INFO       <180.00> EO-1: setting timed terminal event at 524.3
2026-02-03 17:26:23,330 sats.satellite.EO-2            INFO       <180.00> EO-2: target index 7 tasked
2026-02-03 17:26:23,330 sats.satellite.EO-2            INFO       <180.00> EO-2: Target(tgt-431) tasked for imaging
2026-02-03 17:26:23,331 sats.satellite.EO-2            INFO       <180.00> EO-2: Target(tgt-431) window enabled: 496.9 to 600.0
2026-02-03 17:26:23,331 sats.satellite.EO-2            INFO       <180.00> EO-2: setting timed terminal event at 600.0
2026-02-03 17:26:23,332 sats.satellite.EO-3            INFO       <180.00> EO-3: target index 9 tasked
2026-02-03 17:26:23,333 sats.satellite.EO-3            INFO       <180.00> EO-3: Target(tgt-128) tasked for imaging
2026-02-03 17:26:23,334 sats.satellite.EO-3            INFO       <180.00> EO-3: Target(tgt-128) window enabled: 428.2 to 600.0
2026-02-03 17:26:23,335 sats.satellite.EO-3            INFO       <180.00> EO-3: setting timed terminal event at 600.0
2026-02-03 17:26:23,373 sats.satellite.EO-1            INFO       <357.00> EO-1: imaged Target(tgt-462)
2026-02-03 17:26:23,374 data.base                      INFO       <357.00> Total reward: {'EO-1': 0.4144746749367094}
2026-02-03 17:26:23,375 sats.satellite.EO-1            INFO       <357.00> EO-1: Satellite EO-1 requires retasking
2026-02-03 17:26:23,375 sats.satellite.EO-1            INFO       <357.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2026-02-03 17:26:23,421 sats.satellite.EO-2            INFO       <357.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2026-02-03 17:26:23,470 sats.satellite.EO-3            INFO       <357.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2026-02-03 17:26:23,522 sats.satellite.EO-1            WARNING    <357.00> EO-1: failed battery_valid check
2026-02-03 17:26:23,524 gym                            INFO       <357.00> Step reward: -0.5855253250632906
2026-02-03 17:26:23,525 gym                            INFO       <357.00> Episode terminated: True
2026-02-03 17:26:23,525 gym                            INFO       <357.00> Episode truncated: False

PettingZoo API

The PettingZoo parallel API environment, ConstellationTasking, is largely the same as GeneralSatelliteTasking. See their documentation for a full description of the API. It tends to separate things into dictionaries keyed by agent, rather than tuples.

[11]:
from bsk_rl import ConstellationTasking

env = ConstellationTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_spaces
2026-02-03 17:26:23,533                                WARNING    Creating logger for new env on PID=4396. Old environments in process may now log times incorrectly.
2026-02-03 17:26:23,535 gym                            INFO       Resetting environment with seed=2541943598
2026-02-03 17:26:23,537 scene.targets                  INFO       Generating 1000 targets
2026-02-03 17:26:23,665 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2026-02-03 17:26:23,711 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2026-02-03 17:26:23,758 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2026-02-03 17:26:23,802 gym                            INFO       <0.00> Environment reset
[11]:
{'EO-1': Box(-1e+16, 1e+16, (20,), float64),
 'EO-2': Box(-1e+16, 1e+16, (20,), float64),
 'EO-3': Box(-1e+16, 1e+16, (20,), float64)}
[12]:
env.action_spaces
[12]:
{'EO-1': Discrete(10), 'EO-2': Discrete(10), 'EO-3': Discrete(10)}

Actions are passed as a dictionary; the agent names can be accessed through the agents property.

[13]:
observation, reward, terminated, truncated, info = env.step(
    {
        env.agents[0]: 7,
        env.agents[1]: 9,
        env.agents[2]: 8,
    }
)
2026-02-03 17:26:23,815 gym                            INFO       <0.00> === STARTING STEP ===
2026-02-03 17:26:23,816 sats.satellite.EO-1            INFO       <0.00> EO-1: target index 7 tasked
2026-02-03 17:26:23,817 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-9) tasked for imaging
2026-02-03 17:26:23,817 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-9) window enabled: 250.7 to 382.3
2026-02-03 17:26:23,818 sats.satellite.EO-1            INFO       <0.00> EO-1: setting timed terminal event at 382.3
2026-02-03 17:26:23,819 sats.satellite.EO-2            INFO       <0.00> EO-2: target index 9 tasked
2026-02-03 17:26:23,819 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-875) tasked for imaging
2026-02-03 17:26:23,820 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-875) window enabled: 242.0 to 442.8
2026-02-03 17:26:23,821 sats.satellite.EO-2            INFO       <0.00> EO-2: setting timed terminal event at 442.8
2026-02-03 17:26:23,822 sats.satellite.EO-3            INFO       <0.00> EO-3: target index 8 tasked
2026-02-03 17:26:23,822 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-175) tasked for imaging
2026-02-03 17:26:23,824 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-175) window enabled: 89.1 to 298.1
2026-02-03 17:26:23,825 sats.satellite.EO-3            INFO       <0.00> EO-3: setting timed terminal event at 298.1
2026-02-03 17:26:23,846 sats.satellite.EO-3            INFO       <92.00> EO-3: imaged Target(tgt-175)
2026-02-03 17:26:23,847 data.base                      INFO       <92.00> Total reward: {'EO-3': 0.950256065736352}
2026-02-03 17:26:23,848 sats.satellite.EO-3            INFO       <92.00> EO-3: Satellite EO-3 requires retasking
2026-02-03 17:26:23,852 gym                            INFO       <92.00> Step reward: {'EO-3': 0.950256065736352}
[14]:
observation
[14]:
{'EO-1': array([ 0.58302244, -0.01614035,  0.78437351, -0.01614035,  0.48832746,
        -0.001211  ,  0.99408101,  0.00779533,  0.26725864,  0.00535268,
         0.26121685,  0.01362218,  0.00124746,  0.02784112,  0.67324512,
         0.0290539 ,  0.71806497,  0.04448813,  0.53286583,  0.07095274]),
 'EO-2': array([ 0.39197767, -0.00992116,  0.02959725, -0.01157818,  0.22179317,
        -0.01614035,  0.74873832, -0.01304684,  0.7961568 ,  0.00757432,
         0.26070417, -0.00138027,  0.00541328,  0.00865512,  0.57113934,
         0.03169924,  0.82468773,  0.02632139,  0.83950077,  0.03931078]),
 'EO-3': array([ 1.04509884e-01, -1.61403509e-02,  4.63988603e-01, -1.85805172e-03,
         8.28862066e-01, -1.61403509e-02,  6.39872524e-01,  1.03746729e-02,
         5.34455121e-01, -4.92439484e-03,  6.59897336e-01, -8.42248133e-04,
         4.23951119e-01,  2.92096615e-03,  1.69138335e-01,  1.41318256e-02,
         1.93309032e-01,  2.08829596e-02,  8.78029920e-01,  7.11050240e-02])}

Other than compatibility with MARL algorithms, the main benefit of the PettingZoo API is that it allows for individual agents to fail without terminating the entire environment.

[15]:
# Immediately kill satellite 0
env.unwrapped.satellites[0].is_alive = isnt_alive
env.agents
[15]:
['EO-1', 'EO-2', 'EO-3']
[16]:
observation, reward, terminated, truncated, info = env.step({
        env.agents[0]: 7,
        env.agents[1]: 9,
    }
)
2026-02-03 17:26:23,869 gym                            INFO       <92.00> === STARTING STEP ===
2026-02-03 17:26:23,870 sats.satellite.EO-1            INFO       <92.00> EO-1: target index 7 tasked
2026-02-03 17:26:23,870 sats.satellite.EO-1            INFO       <92.00> EO-1: Target(tgt-534) tasked for imaging
2026-02-03 17:26:23,871 sats.satellite.EO-1            INFO       <92.00> EO-1: Target(tgt-534) window enabled: 257.6 to 447.9
2026-02-03 17:26:23,871 sats.satellite.EO-1            INFO       <92.00> EO-1: setting timed terminal event at 447.9
2026-02-03 17:26:23,872 sats.satellite.EO-2            INFO       <92.00> EO-2: target index 9 tasked
2026-02-03 17:26:23,873 sats.satellite.EO-2            INFO       <92.00> EO-2: Target(tgt-182) tasked for imaging
2026-02-03 17:26:23,873 sats.satellite.EO-2            INFO       <92.00> EO-2: Target(tgt-182) window enabled: 316.1 to 493.2
2026-02-03 17:26:23,874 sats.satellite.EO-2            INFO       <92.00> EO-2: setting timed terminal event at 493.2
2026-02-03 17:26:23,875 sats.satellite.EO-3            WARNING    <92.00> EO-3: Requires retasking but received no task.
2026-02-03 17:26:23,912 sats.satellite.EO-1            INFO       <260.00> EO-1: imaged Target(tgt-534)
2026-02-03 17:26:23,912 data.base                      INFO       <260.00> Total reward: {'EO-1': 0.6732451201264564}
2026-02-03 17:26:23,914 sats.satellite.EO-1            INFO       <260.00> EO-1: Satellite EO-1 requires retasking
2026-02-03 17:26:23,914 sats.satellite.EO-3            INFO       <260.00> EO-3: Satellite EO-3 requires retasking
2026-02-03 17:26:23,916 sats.satellite.EO-1            INFO       <260.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2026-02-03 17:26:23,973 sats.satellite.EO-3            INFO       <260.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2026-02-03 17:26:24,016 gym                            INFO       <260.00> Step reward: {'EO-1': -0.32675487987354357}
2026-02-03 17:26:24,017 gym                            INFO       <260.00> Episode terminated: ['EO-1']