Multi-Agent Environments

Two multiagent environments are given in the package:

The latter is preferable for multi-agent RL (MARL) settings, as most algorithms are designed for this kind of API.

Configuring the Environment

For this example, a multisatellite target imaging environment will be used. The goal is to maximize the value of unique images taken.

As usual, the satellite type is defined first.

[1]:
from bsk_rl import sats, act, obs, scene, data, comm
from bsk_rl.sim import dyn, fsw

class ImagingSatellite(sats.ImagingSatellite):
    observation_spec = [
        obs.OpportunityProperties(
            dict(prop="priority"),
            dict(prop="opportunity_open", norm=5700.0),
            n_ahead_observe=10,
        )
    ]
    action_spec = [act.Image(n_ahead_image=10)]
    dyn_type = dyn.FullFeaturedDynModel
    fsw_type = fsw.SteeringImagerFSWModel

Satellite properties are set to give the satellite near-unlimited power and storage resources. To randomize some parameters in a correlated manner across satellites, a sat_arg_randomizer is set and passed to the environment. In this case, the satellites are distributed in a trivial single-plane Walker-delta constellation.

[2]:

from bsk_rl.utils.orbital import walker_delta_args sat_args = dict( imageAttErrorRequirement=0.01, imageRateErrorRequirement=0.01, batteryStorageCapacity=1e9, storedCharge_Init=1e9, dataStorageCapacity=1e12, u_max=0.4, K1=0.25, K3=3.0, omega_max=0.087, servo_Ki=5.0, servo_P=150 / 5, ) sat_arg_randomizer = walker_delta_args(altitude=800.0, inc=60.0, n_planes=1)

Gym API

GeneralSatelliteTasking uses tuples of actions and observations to interact with the environment.

[3]:
from bsk_rl import GeneralSatelliteTasking

env = GeneralSatelliteTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_space
2026-01-05 18:31:23,924 gym                            INFO       Resetting environment with seed=1121006930
2026-01-05 18:31:23,926 scene.targets                  INFO       Generating 1000 targets
2026-01-05 18:31:24,101 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2026-01-05 18:31:24,147 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2026-01-05 18:31:24,185 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2026-01-05 18:31:24,225 gym                            INFO       <0.00> Environment reset
[3]:
Tuple(Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64))
[4]:
env.action_space
[4]:
Tuple(Discrete(10), Discrete(10), Discrete(10))

Consequently, actions are passed as a tuple. The step will stop the first time any satellite completes an action.

[5]:
observation, reward, terminated, truncated, info = env.step([7, 9, 8])
2026-01-05 18:31:24,240 gym                            INFO       <0.00> === STARTING STEP ===
2026-01-05 18:31:24,241 sats.satellite.EO-1            INFO       <0.00> EO-1: target index 7 tasked
2026-01-05 18:31:24,242 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-1) tasked for imaging
2026-01-05 18:31:24,243 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-1) window enabled: 557.1 to 600.0
2026-01-05 18:31:24,243 sats.satellite.EO-1            INFO       <0.00> EO-1: setting timed terminal event at 600.0
2026-01-05 18:31:24,245 sats.satellite.EO-2            INFO       <0.00> EO-2: target index 9 tasked
2026-01-05 18:31:24,246 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-780) tasked for imaging
2026-01-05 18:31:24,247 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-780) window enabled: 411.7 to 600.0
2026-01-05 18:31:24,247 sats.satellite.EO-2            INFO       <0.00> EO-2: setting timed terminal event at 600.0
2026-01-05 18:31:24,248 sats.satellite.EO-3            INFO       <0.00> EO-3: target index 8 tasked
2026-01-05 18:31:24,249 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-149) tasked for imaging
2026-01-05 18:31:24,250 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-149) window enabled: 331.4 to 478.2
2026-01-05 18:31:24,250 sats.satellite.EO-3            INFO       <0.00> EO-3: setting timed terminal event at 478.2
2026-01-05 18:31:24,323 sats.satellite.EO-3            INFO       <334.00> EO-3: imaged Target(tgt-149)
2026-01-05 18:31:24,324 data.base                      INFO       <334.00> Total reward: {'EO-3': 0.7469752005970821}
2026-01-05 18:31:24,325 sats.satellite.EO-3            INFO       <334.00> EO-3: Satellite EO-3 requires retasking
2026-01-05 18:31:24,325 sats.satellite.EO-1            INFO       <334.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2026-01-05 18:31:24,374 sats.satellite.EO-2            INFO       <334.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2026-01-05 18:31:24,419 sats.satellite.EO-3            INFO       <334.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2026-01-05 18:31:24,469 gym                            INFO       <334.00> Step reward: 0.7469752005970821
[6]:
observation
[6]:
(array([ 0.17814265, -0.02125003,  0.30168747,  0.01178454,  0.76385136,
         0.00151344,  0.2985024 ,  0.01093736,  0.04293276,  0.03914387,
         0.58045798,  0.02859246,  0.08123652,  0.03883981,  0.06155381,
         0.02234913,  0.33593696,  0.04059673,  0.70369807,  0.06788393]),
 array([ 0.28995587, -0.03293671,  0.35235402, -0.01068806,  0.1637984 ,
         0.02773173,  0.80337029,  0.01363022,  0.14420844,  0.04038681,
         0.51275788,  0.05106484,  0.98493201,  0.08879523,  0.40875852,
         0.07581476,  0.22659839,  0.0796111 ,  0.86945389,  0.10191264]),
 array([ 0.83611944, -0.00324811,  0.04799193, -0.00686338,  0.05630046,
         0.02147299,  0.58741631,  0.01240875,  0.39235536,  0.00493732,
         0.90627614,  0.01726355,  0.04152682,  0.06172129,  0.78675806,
         0.08765289,  0.44868687,  0.10357021,  0.38332685,  0.11378443]))

At this point, either every satellite can be retasked, or satellites can continue their previous action by passing None as the action. To see which satellites must be retasked (i.e. their previous action is done and they have nothing more to do), look at "requires_retasking" in each satellite’s info.

[7]:
info
[7]:
{'EO-1': {'requires_retasking': False},
 'EO-2': {'requires_retasking': False},
 'EO-3': {'requires_retasking': True},
 'd_ts': 334.0}

Based on this list, we decide here to only retask the satellite that needs it.

[8]:
actions = [0 if info[sat.name]["requires_retasking"] else None for sat in env.unwrapped.satellites]
actions
[8]:
[None, None, 0]
[9]:
observation, reward, terminated, truncated, info = env.step(actions)
2026-01-05 18:31:24,493 gym                            INFO       <334.00> === STARTING STEP ===
2026-01-05 18:31:24,494 sats.satellite.EO-3            INFO       <334.00> EO-3: target index 0 tasked
2026-01-05 18:31:24,495 sats.satellite.EO-3            INFO       <334.00> EO-3: Target(tgt-501) tasked for imaging
2026-01-05 18:31:24,496 sats.satellite.EO-3            INFO       <334.00> EO-3: Target(tgt-501) window enabled: 315.5 to 348.3
2026-01-05 18:31:24,496 sats.satellite.EO-3            INFO       <334.00> EO-3: setting timed terminal event at 348.3
2026-01-05 18:31:24,501 sats.satellite.EO-3            INFO       <349.00> EO-3: timed termination at 348.3 for Target(tgt-501) window
2026-01-05 18:31:24,502 data.base                      INFO       <349.00> Total reward: {}
2026-01-05 18:31:24,503 sats.satellite.EO-3            INFO       <349.00> EO-3: Satellite EO-3 requires retasking
2026-01-05 18:31:24,506 gym                            INFO       <349.00> Step reward: 0.0

In this environment, the environment will stop if any agent dies. To demonstrate this, one satellite is forcibly killed.

[10]:
from Basilisk.architecture import messaging

def isnt_alive(log_failure=False):
    """Mock satellite 0 dying."""
    self = env.unwrapped.satellites[0]
    death_message = messaging.PowerStorageStatusMsgPayload()
    death_message.storageLevel = 0.0
    self.dynamics.powerMonitor.batPowerOutMsg.write(death_message)
    return self.dynamics.is_alive(log_failure=log_failure) and self.fsw.is_alive(
        log_failure=log_failure
    )

env.unwrapped.satellites[0].is_alive = isnt_alive
observation, reward, terminated, truncated, info = env.step([6, 7, 9])

2026-01-05 18:31:24,512 gym                            INFO       <349.00> === STARTING STEP ===
2026-01-05 18:31:24,513 sats.satellite.EO-1            INFO       <349.00> EO-1: target index 6 tasked
2026-01-05 18:31:24,514 sats.satellite.EO-1            INFO       <349.00> EO-1: Target(tgt-262) tasked for imaging
2026-01-05 18:31:24,515 sats.satellite.EO-1            INFO       <349.00> EO-1: Target(tgt-262) window enabled: 555.4 to 760.4
2026-01-05 18:31:24,515 sats.satellite.EO-1            INFO       <349.00> EO-1: setting timed terminal event at 760.4
2026-01-05 18:31:24,516 sats.satellite.EO-2            INFO       <349.00> EO-2: target index 7 tasked
2026-01-05 18:31:24,516 sats.satellite.EO-2            INFO       <349.00> EO-2: Target(tgt-887) tasked for imaging
2026-01-05 18:31:24,518 sats.satellite.EO-2            INFO       <349.00> EO-2: Target(tgt-887) window enabled: 787.8 to 955.1
2026-01-05 18:31:24,519 sats.satellite.EO-2            INFO       <349.00> EO-2: setting timed terminal event at 955.1
2026-01-05 18:31:24,520 sats.satellite.EO-3            INFO       <349.00> EO-3: target index 9 tasked
2026-01-05 18:31:24,520 sats.satellite.EO-3            INFO       <349.00> EO-3: Target(tgt-230) tasked for imaging
2026-01-05 18:31:24,521 sats.satellite.EO-3            INFO       <349.00> EO-3: Target(tgt-230) window enabled: 994.6 to 1145.1
2026-01-05 18:31:24,522 sats.satellite.EO-3            INFO       <349.00> EO-3: setting timed terminal event at 1145.1
2026-01-05 18:31:24,567 sats.satellite.EO-1            INFO       <558.00> EO-1: imaged Target(tgt-262)
2026-01-05 18:31:24,568 data.base                      INFO       <558.00> Total reward: {'EO-1': 0.08123652222409783}
2026-01-05 18:31:24,569 sats.satellite.EO-1            INFO       <558.00> EO-1: Satellite EO-1 requires retasking
2026-01-05 18:31:24,571 sats.satellite.EO-3            INFO       <558.00> EO-3: Finding opportunity windows from 1200.00 to 1800.00 seconds
2026-01-05 18:31:24,610 sats.satellite.EO-1            WARNING    <558.00> EO-1: failed battery_valid check
2026-01-05 18:31:24,612 gym                            INFO       <558.00> Step reward: -0.9187634777759022
2026-01-05 18:31:24,613 gym                            INFO       <558.00> Episode terminated: True
2026-01-05 18:31:24,613 gym                            INFO       <558.00> Episode truncated: False

PettingZoo API

The PettingZoo parallel API environment, ConstellationTasking, is largely the same as GeneralSatelliteTasking. See their documentation for a full description of the API. It tends to separate things into dictionaries keyed by agent, rather than tuples.

[11]:
from bsk_rl import ConstellationTasking

env = ConstellationTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_spaces
2026-01-05 18:31:24,621                                WARNING    Creating logger for new env on PID=4341. Old environments in process may now log times incorrectly.
2026-01-05 18:31:24,736 gym                            INFO       Resetting environment with seed=1932994028
2026-01-05 18:31:24,738 scene.targets                  INFO       Generating 1000 targets
2026-01-05 18:31:24,895 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2026-01-05 18:31:24,935 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2026-01-05 18:31:24,978 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2026-01-05 18:31:25,016 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2026-01-05 18:31:25,052 gym                            INFO       <0.00> Environment reset
[11]:
{'EO-1': Box(-1e+16, 1e+16, (20,), float64),
 'EO-2': Box(-1e+16, 1e+16, (20,), float64),
 'EO-3': Box(-1e+16, 1e+16, (20,), float64)}
[12]:
env.action_spaces
[12]:
{'EO-1': Discrete(10), 'EO-2': Discrete(10), 'EO-3': Discrete(10)}

Actions are passed as a dictionary; the agent names can be accessed through the agents property.

[13]:
observation, reward, terminated, truncated, info = env.step(
    {
        env.agents[0]: 7,
        env.agents[1]: 9,
        env.agents[2]: 8,
    }
)
2026-01-05 18:31:25,067 gym                            INFO       <0.00> === STARTING STEP ===
2026-01-05 18:31:25,068 sats.satellite.EO-1            INFO       <0.00> EO-1: target index 7 tasked
2026-01-05 18:31:25,068 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-150) tasked for imaging
2026-01-05 18:31:25,070 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-150) window enabled: 448.9 to 517.6
2026-01-05 18:31:25,070 sats.satellite.EO-1            INFO       <0.00> EO-1: setting timed terminal event at 517.6
2026-01-05 18:31:25,071 sats.satellite.EO-2            INFO       <0.00> EO-2: target index 9 tasked
2026-01-05 18:31:25,072 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-70) tasked for imaging
2026-01-05 18:31:25,072 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-70) window enabled: 328.0 to 384.8
2026-01-05 18:31:25,073 sats.satellite.EO-2            INFO       <0.00> EO-2: setting timed terminal event at 384.8
2026-01-05 18:31:25,074 sats.satellite.EO-3            INFO       <0.00> EO-3: target index 8 tasked
2026-01-05 18:31:25,075 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-810) tasked for imaging
2026-01-05 18:31:25,076 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-810) window enabled: 467.8 to 648.4
2026-01-05 18:31:25,076 sats.satellite.EO-3            INFO       <0.00> EO-3: setting timed terminal event at 648.4
2026-01-05 18:31:25,147 sats.satellite.EO-2            INFO       <331.00> EO-2: imaged Target(tgt-70)
2026-01-05 18:31:25,148 data.base                      INFO       <331.00> Total reward: {'EO-2': 0.8217686059185982}
2026-01-05 18:31:25,149 sats.satellite.EO-2            INFO       <331.00> EO-2: Satellite EO-2 requires retasking
2026-01-05 18:31:25,150 sats.satellite.EO-2            INFO       <331.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2026-01-05 18:31:25,195 gym                            INFO       <331.00> Step reward: {'EO-2': 0.8217686059185982}
[14]:
observation
[14]:
{'EO-1': array([ 2.42971514e-01, -1.91580068e-02,  6.04465645e-01, -2.40473101e-02,
         6.59537330e-01,  2.06772536e-02,  8.82758624e-01, -8.78256081e-04,
         7.04726742e-01,  3.51900393e-02,  5.79475556e-01,  4.08368029e-02,
         2.99928617e-01,  4.42073517e-02,  7.96319618e-01,  2.85676079e-02,
         2.78967637e-01,  3.45188389e-02,  9.39378405e-01,  3.08168325e-02]),
 'EO-2': array([ 0.80261331, -0.02098239,  0.13329756, -0.02118758,  0.31734636,
        -0.00440867,  0.02612522,  0.04337834,  0.12517794,  0.02262706,
         0.13741503,  0.02178146,  0.20850617,  0.02615243,  0.15869366,
         0.03430896,  0.07001425,  0.04041201,  0.15496542,  0.05210047]),
 'EO-3': array([ 3.11418108e-01, -2.54378510e-02,  5.28457083e-01, -3.67511225e-04,
         6.92753862e-01,  2.01679370e-02,  9.30742391e-01,  3.22472822e-02,
         9.92685912e-01,  2.40057959e-02,  4.76575499e-01,  4.82814009e-02,
         5.37040256e-01,  6.08591040e-02,  8.04897853e-01,  5.86380772e-02,
         3.58248052e-01,  6.04402990e-02,  6.87356185e-01,  1.03003385e-01])}

Other than compatibility with MARL algorithms, the main benefit of the PettingZoo API is that it allows for individual agents to fail without terminating the entire environment.

[15]:
# Immediately kill satellite 0
env.unwrapped.satellites[0].is_alive = isnt_alive
env.agents
[15]:
['EO-1', 'EO-2', 'EO-3']
[16]:
observation, reward, terminated, truncated, info = env.step({
        env.agents[0]: 7,
        env.agents[1]: 9,
    }
)
2026-01-05 18:31:25,212 gym                            INFO       <331.00> === STARTING STEP ===
2026-01-05 18:31:25,212 sats.satellite.EO-1            INFO       <331.00> EO-1: target index 7 tasked
2026-01-05 18:31:25,213 sats.satellite.EO-1            INFO       <331.00> EO-1: Target(tgt-831) tasked for imaging
2026-01-05 18:31:25,214 sats.satellite.EO-1            INFO       <331.00> EO-1: Target(tgt-831) window enabled: 493.8 to 600.0
2026-01-05 18:31:25,215 sats.satellite.EO-1            INFO       <331.00> EO-1: setting timed terminal event at 600.0
2026-01-05 18:31:25,216 sats.satellite.EO-2            INFO       <331.00> EO-2: target index 9 tasked
2026-01-05 18:31:25,217 sats.satellite.EO-2            INFO       <331.00> EO-2: Target(tgt-881) tasked for imaging
2026-01-05 18:31:25,218 sats.satellite.EO-2            INFO       <331.00> EO-2: Target(tgt-881) window enabled: 628.0 to 806.4
2026-01-05 18:31:25,218 sats.satellite.EO-2            INFO       <331.00> EO-2: setting timed terminal event at 806.4
2026-01-05 18:31:25,253 sats.satellite.EO-3            INFO       <470.00> EO-3: imaged Target(tgt-810)
2026-01-05 18:31:25,254 data.base                      INFO       <470.00> Total reward: {'EO-3': 0.9926859122075967}
2026-01-05 18:31:25,255 sats.satellite.EO-3            INFO       <470.00> EO-3: Satellite EO-3 requires retasking
2026-01-05 18:31:25,257 sats.satellite.EO-1            INFO       <470.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2026-01-05 18:31:25,302 gym                            INFO       <470.00> Step reward: {'EO-1': -1.0, 'EO-3': 0.9926859122075967}
2026-01-05 18:31:25,302 gym                            INFO       <470.00> Episode terminated: ['EO-1']