Multi-Agent Environments

Two multiagent environments are given in the package:

The latter is preferable for multi-agent RL (MARL) settings, as most algorithms are designed for this kind of API.

Configuring the Environment

For this example, a multisatellite target imaging environment will be used. The goal is to maximize the value of unique images taken.

As usual, the satellite type is defined first.

[1]:
from bsk_rl import sats, act, obs, scene, data, comm
from bsk_rl.sim import dyn, fsw

class ImagingSatellite(sats.ImagingSatellite):
    observation_spec = [
        obs.OpportunityProperties(
            dict(prop="priority"),
            dict(prop="opportunity_open", norm=5700.0),
            n_ahead_observe=10,
        )
    ]
    action_spec = [act.Image(n_ahead_image=10)]
    dyn_type = dyn.FullFeaturedDynModel
    fsw_type = fsw.SteeringImagerFSWModel

Satellite properties are set to give the satellite near-unlimited power and storage resources. To randomize some parameters in a correlated manner across satellites, a sat_arg_randomizer is set and passed to the environment. In this case, the satellites are distributed in a trivial single-plane Walker-delta constellation.

[2]:

from bsk_rl.utils.orbital import walker_delta_args sat_args = dict( imageAttErrorRequirement=0.01, imageRateErrorRequirement=0.01, batteryStorageCapacity=1e9, storedCharge_Init=1e9, dataStorageCapacity=1e12, u_max=0.4, K1=0.25, K3=3.0, omega_max=0.087, servo_Ki=5.0, servo_P=150 / 5, ) sat_arg_randomizer = walker_delta_args(altitude=800.0, inc=60.0, n_planes=1)

Gym API

GeneralSatelliteTasking uses tuples of actions and observations to interact with the environment.

[3]:
from bsk_rl import GeneralSatelliteTasking

env = GeneralSatelliteTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_space
2025-08-25 18:15:57,637 gym                            INFO       Resetting environment with seed=432245736
2025-08-25 18:15:57,639 scene.targets                  INFO       Generating 1000 targets
2025-08-25 18:15:57,813 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2025-08-25 18:15:57,857 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2025-08-25 18:15:57,984 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2025-08-25 18:15:58,022 gym                            INFO       <0.00> Environment reset
[3]:
Tuple(Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64))
[4]:
env.action_space
[4]:
Tuple(Discrete(10), Discrete(10), Discrete(10))

Consequently, actions are passed as a tuple. The step will stop the first time any satellite completes an action.

[5]:
observation, reward, terminated, truncated, info = env.step([7, 9, 8])
2025-08-25 18:15:58,036 gym                            INFO       <0.00> === STARTING STEP ===
2025-08-25 18:15:58,037 sats.satellite.EO-1            INFO       <0.00> EO-1: target index 7 tasked
2025-08-25 18:15:58,037 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-424) tasked for imaging
2025-08-25 18:15:58,039 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-424) window enabled: 350.2 to 547.0
2025-08-25 18:15:58,040 sats.satellite.EO-1            INFO       <0.00> EO-1: setting timed terminal event at 547.0
2025-08-25 18:15:58,041 sats.satellite.EO-2            INFO       <0.00> EO-2: target index 9 tasked
2025-08-25 18:15:58,041 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-170) tasked for imaging
2025-08-25 18:15:58,042 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-170) window enabled: 184.5 to 394.1
2025-08-25 18:15:58,043 sats.satellite.EO-2            INFO       <0.00> EO-2: setting timed terminal event at 394.1
2025-08-25 18:15:58,044 sats.satellite.EO-3            INFO       <0.00> EO-3: target index 8 tasked
2025-08-25 18:15:58,044 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-610) tasked for imaging
2025-08-25 18:15:58,046 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-610) window enabled: 590.9 to 600.0
2025-08-25 18:15:58,046 sats.satellite.EO-3            INFO       <0.00> EO-3: setting timed terminal event at 600.0
2025-08-25 18:15:58,156 sats.satellite.EO-2            INFO       <187.00> EO-2: imaged Target(tgt-170)
2025-08-25 18:15:58,159 data.base                      INFO       <187.00> Total reward: {'EO-2': 0.6279896668993955}
2025-08-25 18:15:58,160 sats.satellite.EO-2            INFO       <187.00> EO-2: Satellite EO-2 requires retasking
2025-08-25 18:15:58,161 sats.satellite.EO-3            INFO       <187.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-08-25 18:15:58,198 gym                            INFO       <187.00> Step reward: 0.6279896668993955
[6]:
observation
[6]:
(array([ 0.67195526, -0.0131385 ,  0.58821643,  0.01041697,  0.10928641,
         0.00464409,  0.45229359,  0.0152561 ,  0.57096796,  0.01912471,
         0.70120367,  0.02862455,  0.43513721,  0.04441111,  0.02280456,
         0.07220715,  0.88551713,  0.0674394 ,  0.61661911,  0.04775066]),
 array([ 0.89953843, -0.03161616,  0.32453705, -0.0085499 ,  0.47339624,
         0.00316136,  0.32601468,  0.02442561,  0.13726415,  0.02219632,
         0.54246879,  0.04297493,  0.64717368,  0.0218927 ,  0.57144445,
         0.0511392 ,  0.51012988,  0.05700217,  0.0127243 ,  0.06130879]),
 array([ 0.86310339, -0.01607339,  0.96719884, -0.02394042,  0.04613973,
         0.01306576,  0.95189648,  0.02825314,  0.95240753,  0.07085935,
         0.57649121,  0.07021517,  0.81911575,  0.03711557,  0.81113628,
         0.08233615,  0.35709025,  0.07635983,  0.12642451,  0.07581924]))

At this point, either every satellite can be retasked, or satellites can continue their previous action by passing None as the action. To see which satellites must be retasked (i.e. their previous action is done and they have nothing more to do), look at "requires_retasking" in each satellite’s info.

[7]:
info
[7]:
{'EO-1': {'requires_retasking': False},
 'EO-2': {'requires_retasking': True},
 'EO-3': {'requires_retasking': False},
 'd_ts': 187.0}

Based on this list, we decide here to only retask the satellite that needs it.

[8]:
actions = [0 if info[sat.name]["requires_retasking"] else None for sat in env.unwrapped.satellites]
actions
[8]:
[None, 0, None]
[9]:
observation, reward, terminated, truncated, info = env.step(actions)
2025-08-25 18:15:58,218 gym                            INFO       <187.00> === STARTING STEP ===
2025-08-25 18:15:58,219 sats.satellite.EO-2            INFO       <187.00> EO-2: target index 0 tasked
2025-08-25 18:15:58,220 sats.satellite.EO-2            INFO       <187.00> EO-2: Target(tgt-435) tasked for imaging
2025-08-25 18:15:58,221 sats.satellite.EO-2            INFO       <187.00> EO-2: Target(tgt-435) window enabled: 6.8 to 210.2
2025-08-25 18:15:58,222 sats.satellite.EO-2            INFO       <187.00> EO-2: setting timed terminal event at 210.2
2025-08-25 18:15:58,237 sats.satellite.EO-2            INFO       <211.00> EO-2: timed termination at 210.2 for Target(tgt-435) window
2025-08-25 18:15:58,241 data.base                      INFO       <211.00> Total reward: {}
2025-08-25 18:15:58,241 sats.satellite.EO-2            INFO       <211.00> EO-2: Satellite EO-2 requires retasking
2025-08-25 18:15:58,244 gym                            INFO       <211.00> Step reward: 0.0

In this environment, the environment will stop if any agent dies. To demonstrate this, one satellite is forcibly killed.

[10]:
from Basilisk.architecture import messaging

def isnt_alive(log_failure=False):
    """Mock satellite 0 dying."""
    self = env.unwrapped.satellites[0]
    death_message = messaging.PowerStorageStatusMsgPayload()
    death_message.storageLevel = 0.0
    self.dynamics.powerMonitor.batPowerOutMsg.write(death_message)
    return self.dynamics.is_alive(log_failure=log_failure) and self.fsw.is_alive(
        log_failure=log_failure
    )

env.unwrapped.satellites[0].is_alive = isnt_alive
observation, reward, terminated, truncated, info = env.step([6, 7, 9])

2025-08-25 18:15:58,250 gym                            INFO       <211.00> === STARTING STEP ===
2025-08-25 18:15:58,251 sats.satellite.EO-1            INFO       <211.00> EO-1: target index 6 tasked
2025-08-25 18:15:58,252 sats.satellite.EO-1            INFO       <211.00> EO-1: Target(tgt-207) tasked for imaging
2025-08-25 18:15:58,253 sats.satellite.EO-1            INFO       <211.00> EO-1: Target(tgt-207) window enabled: 440.1 to 600.0
2025-08-25 18:15:58,254 sats.satellite.EO-1            INFO       <211.00> EO-1: setting timed terminal event at 600.0
2025-08-25 18:15:58,255 sats.satellite.EO-2            INFO       <211.00> EO-2: target index 7 tasked
2025-08-25 18:15:58,256 sats.satellite.EO-2            INFO       <211.00> EO-2: Target(tgt-109) tasked for imaging
2025-08-25 18:15:58,257 sats.satellite.EO-2            INFO       <211.00> EO-2: Target(tgt-109) window enabled: 511.9 to 600.0
2025-08-25 18:15:58,257 sats.satellite.EO-2            INFO       <211.00> EO-2: setting timed terminal event at 600.0
2025-08-25 18:15:58,258 sats.satellite.EO-3            INFO       <211.00> EO-3: target index 9 tasked
2025-08-25 18:15:58,259 sats.satellite.EO-3            INFO       <211.00> EO-3: Target(tgt-121) tasked for imaging
2025-08-25 18:15:58,260 sats.satellite.EO-3            INFO       <211.00> EO-3: Target(tgt-121) window enabled: 619.2 to 822.5
2025-08-25 18:15:58,260 sats.satellite.EO-3            INFO       <211.00> EO-3: setting timed terminal event at 822.5
2025-08-25 18:15:58,396 sats.satellite.EO-1            INFO       <443.00> EO-1: imaged Target(tgt-207)
2025-08-25 18:15:58,399 data.base                      INFO       <443.00> Total reward: {'EO-1': 0.43513721064852073}
2025-08-25 18:15:58,400 sats.satellite.EO-1            INFO       <443.00> EO-1: Satellite EO-1 requires retasking
2025-08-25 18:15:58,400 sats.satellite.EO-1            INFO       <443.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-08-25 18:15:58,454 sats.satellite.EO-1            WARNING    <443.00> EO-1: failed battery_valid check
2025-08-25 18:15:58,455 gym                            INFO       <443.00> Step reward: -0.5648627893514793
2025-08-25 18:15:58,455 gym                            INFO       <443.00> Episode terminated: True
2025-08-25 18:15:58,456 gym                            INFO       <443.00> Episode truncated: False

PettingZoo API

The PettingZoo parallel API environment, ConstellationTasking, is largely the same as GeneralSatelliteTasking. See their documentation for a full description of the API. It tends to separate things into dictionaries keyed by agent, rather than tuples.

[11]:
from bsk_rl import ConstellationTasking

env = ConstellationTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_spaces
2025-08-25 18:15:58,464                                WARNING    Creating logger for new env on PID=4865. Old environments in process may now log times incorrectly.
2025-08-25 18:15:58,570 gym                            INFO       Resetting environment with seed=1366178557
2025-08-25 18:15:58,573 scene.targets                  INFO       Generating 1000 targets
2025-08-25 18:15:58,734 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2025-08-25 18:15:58,772 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2025-08-25 18:15:58,810 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2025-08-25 18:15:58,848 gym                            INFO       <0.00> Environment reset
[11]:
{'EO-1': Box(-1e+16, 1e+16, (20,), float64),
 'EO-2': Box(-1e+16, 1e+16, (20,), float64),
 'EO-3': Box(-1e+16, 1e+16, (20,), float64)}
[12]:
env.action_spaces
[12]:
{'EO-1': Discrete(10), 'EO-2': Discrete(10), 'EO-3': Discrete(10)}

Actions are passed as a dictionary; the agent names can be accessed through the agents property.

[13]:
observation, reward, terminated, truncated, info = env.step(
    {
        env.agents[0]: 7,
        env.agents[1]: 9,
        env.agents[2]: 8,
    }
)
2025-08-25 18:15:58,862 gym                            INFO       <0.00> === STARTING STEP ===
2025-08-25 18:15:58,862 sats.satellite.EO-1            INFO       <0.00> EO-1: target index 7 tasked
2025-08-25 18:15:58,863 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-788) tasked for imaging
2025-08-25 18:15:58,865 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-788) window enabled: 410.2 to 581.2
2025-08-25 18:15:58,865 sats.satellite.EO-1            INFO       <0.00> EO-1: setting timed terminal event at 581.2
2025-08-25 18:15:58,866 sats.satellite.EO-2            INFO       <0.00> EO-2: target index 9 tasked
2025-08-25 18:15:58,867 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-661) tasked for imaging
2025-08-25 18:15:58,868 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-661) window enabled: 420.5 to 516.4
2025-08-25 18:15:58,868 sats.satellite.EO-2            INFO       <0.00> EO-2: setting timed terminal event at 516.4
2025-08-25 18:15:58,869 sats.satellite.EO-3            INFO       <0.00> EO-3: target index 8 tasked
2025-08-25 18:15:58,870 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-75) tasked for imaging
2025-08-25 18:15:58,871 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-75) window enabled: 431.3 to 550.7
2025-08-25 18:15:58,871 sats.satellite.EO-3            INFO       <0.00> EO-3: setting timed terminal event at 550.7
2025-08-25 18:15:59,112 sats.satellite.EO-1            INFO       <413.00> EO-1: imaged Target(tgt-788)
2025-08-25 18:15:59,115 data.base                      INFO       <413.00> Total reward: {'EO-1': 0.9240590572874334}
2025-08-25 18:15:59,116 sats.satellite.EO-1            INFO       <413.00> EO-1: Satellite EO-1 requires retasking
2025-08-25 18:15:59,118 sats.satellite.EO-1            INFO       <413.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-08-25 18:15:59,156 sats.satellite.EO-2            INFO       <413.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-08-25 18:15:59,193 sats.satellite.EO-3            INFO       <413.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-08-25 18:15:59,236 gym                            INFO       <413.00> Step reward: {'EO-1': 0.9240590572874334}
[14]:
observation
[14]:
{'EO-1': array([ 0.38628869, -0.02397547,  0.68500439, -0.0080666 ,  0.04683832,
        -0.00843742,  0.94461549, -0.00313442,  0.41817338,  0.03382674,
         0.36677655,  0.03155474,  0.553483  ,  0.02137605,  0.94613961,
         0.02440118,  0.31712277,  0.03783754,  0.26500127,  0.04130052]),
 'EO-2': array([ 0.84462532, -0.00906304,  0.65355608, -0.02041326,  0.53545735,
         0.001324  ,  0.27110441,  0.01721704,  0.80348214,  0.00587785,
         0.7782801 ,  0.0261561 ,  0.77831711,  0.00246791,  0.72494816,
         0.01603342,  0.27434738,  0.05507907,  0.32957325,  0.07789707]),
 'EO-3': array([0.12423766, 0.00321866, 0.07055941, 0.02019085, 0.8965937 ,
        0.03131486, 0.93046695, 0.01436028, 0.05699262, 0.02994066,
        0.43220101, 0.03842783, 0.07199655, 0.03329278, 0.51333655,
        0.04460587, 0.64744905, 0.05266886, 0.33198582, 0.05377176])}

Other than compatibility with MARL algorithms, the main benefit of the PettingZoo API is that it allows for individual agents to fail without terminating the entire environment.

[15]:
# Immediately kill satellite 0
env.unwrapped.satellites[0].is_alive = isnt_alive
env.agents
[15]:
['EO-1', 'EO-2', 'EO-3']
[16]:
observation, reward, terminated, truncated, info = env.step({
        env.agents[0]: 7,
        env.agents[1]: 9,
    }
)
2025-08-25 18:15:59,251 gym                            INFO       <413.00> === STARTING STEP ===
2025-08-25 18:15:59,252 sats.satellite.EO-1            INFO       <413.00> EO-1: target index 7 tasked
2025-08-25 18:15:59,253 sats.satellite.EO-1            INFO       <413.00> EO-1: Target(tgt-454) tasked for imaging
2025-08-25 18:15:59,254 sats.satellite.EO-1            INFO       <413.00> EO-1: Target(tgt-454) window enabled: 552.1 to 761.8
2025-08-25 18:15:59,255 sats.satellite.EO-1            INFO       <413.00> EO-1: setting timed terminal event at 761.8
2025-08-25 18:15:59,256 sats.satellite.EO-2            INFO       <413.00> EO-2: target index 9 tasked
2025-08-25 18:15:59,256 sats.satellite.EO-2            INFO       <413.00> EO-2: Target(tgt-253) tasked for imaging
2025-08-25 18:15:59,257 sats.satellite.EO-2            INFO       <413.00> EO-2: Target(tgt-253) window enabled: 857.0 to 935.3
2025-08-25 18:15:59,258 sats.satellite.EO-2            INFO       <413.00> EO-2: setting timed terminal event at 935.3
2025-08-25 18:15:59,272 sats.satellite.EO-3            INFO       <434.00> EO-3: imaged Target(tgt-75)
2025-08-25 18:15:59,275 data.base                      INFO       <434.00> Total reward: {'EO-3': 0.12423766204161057}
2025-08-25 18:15:59,275 sats.satellite.EO-3            INFO       <434.00> EO-3: Satellite EO-3 requires retasking
2025-08-25 18:15:59,279 gym                            INFO       <434.00> Step reward: {'EO-1': -1.0, 'EO-3': 0.12423766204161057}
2025-08-25 18:15:59,279 gym                            INFO       <434.00> Episode terminated: ['EO-1']