Multi-Agent Environments

Two multiagent environments are given in the package:

The latter is preferable for multi-agent RL (MARL) settings, as most algorithms are designed for this kind of API.

Configuring the Environment

For this example, a multisatellite target imaging environment will be used. The goal is to maximize the value of unique images taken.

As usual, the satellite type is defined first.

[1]:
from bsk_rl import sats, act, obs, scene, data, comm
from bsk_rl.sim import dyn, fsw

class ImagingSatellite(sats.ImagingSatellite):
    observation_spec = [
        obs.OpportunityProperties(
            dict(prop="priority"),
            dict(prop="opportunity_open", norm=5700.0),
            n_ahead_observe=10,
        )
    ]
    action_spec = [act.Image(n_ahead_image=10)]
    dyn_type = dyn.FullFeaturedDynModel
    fsw_type = fsw.SteeringImagerFSWModel

Satellite properties are set to give the satellite near-unlimited power and storage resources. To randomize some parameters in a correlated manner across satellites, a sat_arg_randomizer is set and passed to the environment. In this case, the satellites are distributed in a trivial single-plane Walker-delta constellation.

[2]:

from bsk_rl.utils.orbital import walker_delta_args sat_args = dict( imageAttErrorRequirement=0.01, imageRateErrorRequirement=0.01, batteryStorageCapacity=1e9, storedCharge_Init=1e9, dataStorageCapacity=1e12, u_max=0.4, K1=0.25, K3=3.0, omega_max=0.087, servo_Ki=5.0, servo_P=150 / 5, ) sat_arg_randomizer = walker_delta_args(altitude=800.0, inc=60.0, n_planes=1)

Gym API

GeneralSatelliteTasking uses tuples of actions and observations to interact with the environment.

[3]:
from bsk_rl import GeneralSatelliteTasking

env = GeneralSatelliteTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_space
2025-11-05 22:48:38,849 gym                            INFO       Resetting environment with seed=1547192042
2025-11-05 22:48:38,851 scene.targets                  INFO       Generating 1000 targets
2025-11-05 22:48:39,018 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2025-11-05 22:48:39,057 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-11-05 22:48:39,094 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2025-11-05 22:48:39,134 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2025-11-05 22:48:39,170 gym                            INFO       <0.00> Environment reset
[3]:
Tuple(Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64))
[4]:
env.action_space
[4]:
Tuple(Discrete(10), Discrete(10), Discrete(10))

Consequently, actions are passed as a tuple. The step will stop the first time any satellite completes an action.

[5]:
observation, reward, terminated, truncated, info = env.step([7, 9, 8])
2025-11-05 22:48:39,185 gym                            INFO       <0.00> === STARTING STEP ===
2025-11-05 22:48:39,186 sats.satellite.EO-1            INFO       <0.00> EO-1: target index 7 tasked
2025-11-05 22:48:39,186 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-519) tasked for imaging
2025-11-05 22:48:39,187 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-519) window enabled: 614.1 to 645.0
2025-11-05 22:48:39,188 sats.satellite.EO-1            INFO       <0.00> EO-1: setting timed terminal event at 645.0
2025-11-05 22:48:39,189 sats.satellite.EO-2            INFO       <0.00> EO-2: target index 9 tasked
2025-11-05 22:48:39,190 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-811) tasked for imaging
2025-11-05 22:48:39,190 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-811) window enabled: 173.7 to 374.7
2025-11-05 22:48:39,191 sats.satellite.EO-2            INFO       <0.00> EO-2: setting timed terminal event at 374.7
2025-11-05 22:48:39,192 sats.satellite.EO-3            INFO       <0.00> EO-3: target index 8 tasked
2025-11-05 22:48:39,192 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-111) tasked for imaging
2025-11-05 22:48:39,193 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-111) window enabled: 437.1 to 600.0
2025-11-05 22:48:39,194 sats.satellite.EO-3            INFO       <0.00> EO-3: setting timed terminal event at 600.0
2025-11-05 22:48:39,235 sats.satellite.EO-2            INFO       <176.00> EO-2: imaged Target(tgt-811)
2025-11-05 22:48:39,236 data.base                      INFO       <176.00> Total reward: {'EO-2': 0.5079934949850267}
2025-11-05 22:48:39,237 sats.satellite.EO-2            INFO       <176.00> EO-2: Satellite EO-2 requires retasking
2025-11-05 22:48:39,238 sats.satellite.EO-3            INFO       <176.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-11-05 22:48:39,279 gym                            INFO       <176.00> Step reward: 0.5079934949850267
[6]:
observation
[6]:
(array([ 0.71621771, -0.01943427,  0.63597323, -0.03087719,  0.56637792,
        -0.01223017,  0.60716887, -0.01812702,  0.56614095, -0.01209571,
         0.06557256,  0.02623297,  0.22177599,  0.07686464,  0.32943726,
         0.08241237,  0.99494099,  0.09417084,  0.71819909,  0.13085989]),
 array([ 0.02048901, -0.01775804,  0.06075115, -0.02341873,  0.49419605,
        -0.01716148,  0.37942278, -0.01944149,  0.12317875,  0.00592013,
         0.44738873, -0.00913504,  0.81575916,  0.02518977,  0.73497147,
         0.03296025,  0.12923178,  0.01267443,  0.66896754,  0.03189218]),
 array([ 0.18337927, -0.03087719,  0.65728279, -0.00812571,  0.45632543,
         0.01313141,  0.63298028,  0.01593659,  0.41372112,  0.00767957,
         0.81193785,  0.02570958,  0.28951229,  0.04579868,  0.87917222,
         0.06156085,  0.57518884,  0.06539764,  0.25933596,  0.07536863]))

At this point, either every satellite can be retasked, or satellites can continue their previous action by passing None as the action. To see which satellites must be retasked (i.e. their previous action is done and they have nothing more to do), look at "requires_retasking" in each satellite’s info.

[7]:
info
[7]:
{'EO-1': {'requires_retasking': False},
 'EO-2': {'requires_retasking': True},
 'EO-3': {'requires_retasking': False},
 'd_ts': 176.0}

Based on this list, we decide here to only retask the satellite that needs it.

[8]:
actions = [0 if info[sat.name]["requires_retasking"] else None for sat in env.unwrapped.satellites]
actions
[8]:
[None, 0, None]
[9]:
observation, reward, terminated, truncated, info = env.step(actions)
2025-11-05 22:48:39,301 gym                            INFO       <176.00> === STARTING STEP ===
2025-11-05 22:48:39,302 sats.satellite.EO-2            INFO       <176.00> EO-2: target index 0 tasked
2025-11-05 22:48:39,302 sats.satellite.EO-2            INFO       <176.00> EO-2: Target(tgt-236) tasked for imaging
2025-11-05 22:48:39,303 sats.satellite.EO-2            INFO       <176.00> EO-2: Target(tgt-236) window enabled: 74.8 to 233.4
2025-11-05 22:48:39,304 sats.satellite.EO-2            INFO       <176.00> EO-2: setting timed terminal event at 233.4
2025-11-05 22:48:39,320 sats.satellite.EO-2            INFO       <234.00> EO-2: timed termination at 233.4 for Target(tgt-236) window
2025-11-05 22:48:39,321 data.base                      INFO       <234.00> Total reward: {}
2025-11-05 22:48:39,322 sats.satellite.EO-2            INFO       <234.00> EO-2: Satellite EO-2 requires retasking
2025-11-05 22:48:39,325 gym                            INFO       <234.00> Step reward: 0.0

In this environment, the environment will stop if any agent dies. To demonstrate this, one satellite is forcibly killed.

[10]:
from Basilisk.architecture import messaging

def isnt_alive(log_failure=False):
    """Mock satellite 0 dying."""
    self = env.unwrapped.satellites[0]
    death_message = messaging.PowerStorageStatusMsgPayload()
    death_message.storageLevel = 0.0
    self.dynamics.powerMonitor.batPowerOutMsg.write(death_message)
    return self.dynamics.is_alive(log_failure=log_failure) and self.fsw.is_alive(
        log_failure=log_failure
    )

env.unwrapped.satellites[0].is_alive = isnt_alive
observation, reward, terminated, truncated, info = env.step([6, 7, 9])

2025-11-05 22:48:39,331 gym                            INFO       <234.00> === STARTING STEP ===
2025-11-05 22:48:39,331 sats.satellite.EO-1            INFO       <234.00> EO-1: target index 6 tasked
2025-11-05 22:48:39,332 sats.satellite.EO-1            INFO       <234.00> EO-1: Target(tgt-84) tasked for imaging
2025-11-05 22:48:39,333 sats.satellite.EO-1            INFO       <234.00> EO-1: Target(tgt-84) window enabled: 921.9 to 953.4
2025-11-05 22:48:39,334 sats.satellite.EO-1            INFO       <234.00> EO-1: setting timed terminal event at 953.4
2025-11-05 22:48:39,335 sats.satellite.EO-2            INFO       <234.00> EO-2: target index 7 tasked
2025-11-05 22:48:39,335 sats.satellite.EO-2            INFO       <234.00> EO-2: Target(tgt-394) tasked for imaging
2025-11-05 22:48:39,336 sats.satellite.EO-2            INFO       <234.00> EO-2: Target(tgt-394) window enabled: 248.2 to 453.9
2025-11-05 22:48:39,337 sats.satellite.EO-2            INFO       <234.00> EO-2: setting timed terminal event at 453.9
2025-11-05 22:48:39,338 sats.satellite.EO-3            INFO       <234.00> EO-3: target index 9 tasked
2025-11-05 22:48:39,338 sats.satellite.EO-3            INFO       <234.00> EO-3: Target(tgt-835) tasked for imaging
2025-11-05 22:48:39,339 sats.satellite.EO-3            INFO       <234.00> EO-3: Target(tgt-835) window enabled: 663.9 to 831.2
2025-11-05 22:48:39,339 sats.satellite.EO-3            INFO       <234.00> EO-3: setting timed terminal event at 831.2
2025-11-05 22:48:39,355 sats.satellite.EO-2            INFO       <298.00> EO-2: imaged Target(tgt-394)
2025-11-05 22:48:39,356 data.base                      INFO       <298.00> Total reward: {'EO-2': 0.12923178411361247}
2025-11-05 22:48:39,357 sats.satellite.EO-2            INFO       <298.00> EO-2: Satellite EO-2 requires retasking
2025-11-05 22:48:39,358 sats.satellite.EO-2            INFO       <298.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-11-05 22:48:39,399 sats.satellite.EO-1            WARNING    <298.00> EO-1: failed battery_valid check
2025-11-05 22:48:39,401 gym                            INFO       <298.00> Step reward: -0.8707682158863875
2025-11-05 22:48:39,401 gym                            INFO       <298.00> Episode terminated: True
2025-11-05 22:48:39,402 gym                            INFO       <298.00> Episode truncated: False

PettingZoo API

The PettingZoo parallel API environment, ConstellationTasking, is largely the same as GeneralSatelliteTasking. See their documentation for a full description of the API. It tends to separate things into dictionaries keyed by agent, rather than tuples.

[11]:
from bsk_rl import ConstellationTasking

env = ConstellationTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_spaces
2025-11-05 22:48:39,410                                WARNING    Creating logger for new env on PID=4827. Old environments in process may now log times incorrectly.
2025-11-05 22:48:39,534 gym                            INFO       Resetting environment with seed=2232055917
2025-11-05 22:48:39,536 scene.targets                  INFO       Generating 1000 targets
2025-11-05 22:48:39,691 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2025-11-05 22:48:39,728 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-11-05 22:48:39,768 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2025-11-05 22:48:39,802 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-11-05 22:48:39,837 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2025-11-05 22:48:39,873 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-11-05 22:48:39,908 gym                            INFO       <0.00> Environment reset
[11]:
{'EO-1': Box(-1e+16, 1e+16, (20,), float64),
 'EO-2': Box(-1e+16, 1e+16, (20,), float64),
 'EO-3': Box(-1e+16, 1e+16, (20,), float64)}
[12]:
env.action_spaces
[12]:
{'EO-1': Discrete(10), 'EO-2': Discrete(10), 'EO-3': Discrete(10)}

Actions are passed as a dictionary; the agent names can be accessed through the agents property.

[13]:
observation, reward, terminated, truncated, info = env.step(
    {
        env.agents[0]: 7,
        env.agents[1]: 9,
        env.agents[2]: 8,
    }
)
2025-11-05 22:48:39,921 gym                            INFO       <0.00> === STARTING STEP ===
2025-11-05 22:48:39,922 sats.satellite.EO-1            INFO       <0.00> EO-1: target index 7 tasked
2025-11-05 22:48:39,922 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-528) tasked for imaging
2025-11-05 22:48:39,923 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-528) window enabled: 679.7 to 815.0
2025-11-05 22:48:39,924 sats.satellite.EO-1            INFO       <0.00> EO-1: setting timed terminal event at 815.0
2025-11-05 22:48:39,924 sats.satellite.EO-2            INFO       <0.00> EO-2: target index 9 tasked
2025-11-05 22:48:39,925 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-166) tasked for imaging
2025-11-05 22:48:39,927 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-166) window enabled: 644.8 to 696.8
2025-11-05 22:48:39,927 sats.satellite.EO-2            INFO       <0.00> EO-2: setting timed terminal event at 696.8
2025-11-05 22:48:39,928 sats.satellite.EO-3            INFO       <0.00> EO-3: target index 8 tasked
2025-11-05 22:48:39,929 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-176) tasked for imaging
2025-11-05 22:48:39,929 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-176) window enabled: 1032.1 to 1200.0
2025-11-05 22:48:39,930 sats.satellite.EO-3            INFO       <0.00> EO-3: setting timed terminal event at 1200.0
2025-11-05 22:48:40,064 sats.satellite.EO-2            INFO       <647.00> EO-2: imaged Target(tgt-166)
2025-11-05 22:48:40,065 data.base                      INFO       <647.00> Total reward: {'EO-2': 0.9210245875955466}
2025-11-05 22:48:40,066 sats.satellite.EO-2            INFO       <647.00> EO-2: Satellite EO-2 requires retasking
2025-11-05 22:48:40,068 sats.satellite.EO-2            INFO       <647.00> EO-2: Finding opportunity windows from 1200.00 to 1800.00 seconds
2025-11-05 22:48:40,106 sats.satellite.EO-3            INFO       <647.00> EO-3: Finding opportunity windows from 1200.00 to 1800.00 seconds
2025-11-05 22:48:40,166 gym                            INFO       <647.00> Step reward: {'EO-2': 0.9210245875955466}
[14]:
observation
[14]:
{'EO-1': array([ 0.09859741, -0.03551423,  0.49198952,  0.00573547,  0.19679985,
         0.01870813,  0.45048406,  0.00248848,  0.55164225,  0.01762653,
         0.02325649,  0.01388872,  0.35569037,  0.03332458,  0.02772981,
         0.03467764,  0.67272115,  0.04605867,  0.2139736 ,  0.07489779]),
 'EO-2': array([ 0.47492646, -0.03580397,  0.30780121, -0.03233286,  0.84733851,
         0.00786825,  0.98881349,  0.01349388,  0.84796453,  0.05064862,
         0.57450089,  0.03511686,  0.9865236 ,  0.0674651 ,  0.80830606,
         0.09871199,  0.69594179,  0.10978892,  0.07363774,  0.14149279]),
 'EO-3': array([ 0.11533782, -0.00608844,  0.85873453,  0.01749988,  0.48530866,
         0.05399338,  0.02190762,  0.06756675,  0.95281325,  0.0802063 ,
         0.67938977,  0.08240643,  0.98664312,  0.08783784,  0.42689825,
         0.12397855,  0.42931792,  0.11831221,  0.96460181,  0.14865361])}

Other than compatibility with MARL algorithms, the main benefit of the PettingZoo API is that it allows for individual agents to fail without terminating the entire environment.

[15]:
# Immediately kill satellite 0
env.unwrapped.satellites[0].is_alive = isnt_alive
env.agents
[15]:
['EO-1', 'EO-2', 'EO-3']
[16]:
observation, reward, terminated, truncated, info = env.step({
        env.agents[0]: 7,
        env.agents[1]: 9,
    }
)
2025-11-05 22:48:40,182 gym                            INFO       <647.00> === STARTING STEP ===
2025-11-05 22:48:40,183 sats.satellite.EO-1            INFO       <647.00> EO-1: target index 7 tasked
2025-11-05 22:48:40,184 sats.satellite.EO-1            INFO       <647.00> EO-1: Target(tgt-328) tasked for imaging
2025-11-05 22:48:40,185 sats.satellite.EO-1            INFO       <647.00> EO-1: Target(tgt-328) window enabled: 844.7 to 1037.7
2025-11-05 22:48:40,185 sats.satellite.EO-1            INFO       <647.00> EO-1: setting timed terminal event at 1037.7
2025-11-05 22:48:40,186 sats.satellite.EO-2            INFO       <647.00> EO-2: target index 9 tasked
2025-11-05 22:48:40,187 sats.satellite.EO-2            INFO       <647.00> EO-2: Target(tgt-704) tasked for imaging
2025-11-05 22:48:40,188 sats.satellite.EO-2            INFO       <647.00> EO-2: Target(tgt-704) window enabled: 1453.5 to 1573.9
2025-11-05 22:48:40,188 sats.satellite.EO-2            INFO       <647.00> EO-2: setting timed terminal event at 1573.9
2025-11-05 22:48:40,230 sats.satellite.EO-1            INFO       <847.00> EO-1: imaged Target(tgt-328)
2025-11-05 22:48:40,232 data.base                      INFO       <847.00> Total reward: {'EO-1': 0.027729808348439078}
2025-11-05 22:48:40,232 sats.satellite.EO-1            INFO       <847.00> EO-1: Satellite EO-1 requires retasking
2025-11-05 22:48:40,234 sats.satellite.EO-1            INFO       <847.00> EO-1: Finding opportunity windows from 1200.00 to 1800.00 seconds
2025-11-05 22:48:40,281 sats.satellite.EO-2            INFO       <847.00> EO-2: Finding opportunity windows from 1800.00 to 2400.00 seconds
2025-11-05 22:48:40,318 gym                            INFO       <847.00> Step reward: {'EO-1': -0.9722701916515609}
2025-11-05 22:48:40,319 gym                            INFO       <847.00> Episode terminated: ['EO-1']