Multi-Agent Environments

Two multiagent environments are given in the package:

The latter is preferable for multi-agent RL (MARL) settings, as most algorithms are designed for this kind of API.

Configuring the Environment

For this example, a multisatellite target imaging environment will be used. The goal is to maximize the value of unique images taken.

As usual, the satellite type is defined first.

[1]:
from bsk_rl import sats, act, obs, scene, data, comm
from bsk_rl.sim import dyn, fsw

class ImagingSatellite(sats.ImagingSatellite):
    observation_spec = [
        obs.OpportunityProperties(
            dict(prop="priority"),
            dict(prop="opportunity_open", norm=5700.0),
            n_ahead_observe=10,
        )
    ]
    action_spec = [act.Image(n_ahead_image=10)]
    dyn_type = dyn.FullFeaturedDynModel
    fsw_type = fsw.SteeringImagerFSWModel

Satellite properties are set to give the satellite near-unlimited power and storage resources. To randomize some parameters in a correlated manner across satellites, a sat_arg_randomizer is set and passed to the environment. In this case, the satellites are distributed in a trivial single-plane Walker-delta constellation.

[2]:

from bsk_rl.utils.orbital import walker_delta_args sat_args = dict( imageAttErrorRequirement=0.01, imageRateErrorRequirement=0.01, batteryStorageCapacity=1e9, storedCharge_Init=1e9, dataStorageCapacity=1e12, u_max=0.4, K1=0.25, K3=3.0, omega_max=0.087, servo_Ki=5.0, servo_P=150 / 5, ) sat_arg_randomizer = walker_delta_args(altitude=800.0, inc=60.0, n_planes=1)

Gym API

GeneralSatelliteTasking uses tuples of actions and observations to interact with the environment.

[3]:
from bsk_rl import GeneralSatelliteTasking

env = GeneralSatelliteTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_space
2025-11-19 19:40:03,642 gym                            INFO       Resetting environment with seed=3349979043
2025-11-19 19:40:03,645 scene.targets                  INFO       Generating 1000 targets
2025-11-19 19:40:03,814 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2025-11-19 19:40:03,857 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-11-19 19:40:03,894 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2025-11-19 19:40:03,926 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-11-19 19:40:03,965 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2025-11-19 19:40:04,006 gym                            INFO       <0.00> Environment reset
[3]:
Tuple(Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64))
[4]:
env.action_space
[4]:
Tuple(Discrete(10), Discrete(10), Discrete(10))

Consequently, actions are passed as a tuple. The step will stop the first time any satellite completes an action.

[5]:
observation, reward, terminated, truncated, info = env.step([7, 9, 8])
2025-11-19 19:40:04,020 gym                            INFO       <0.00> === STARTING STEP ===
2025-11-19 19:40:04,021 sats.satellite.EO-1            INFO       <0.00> EO-1: target index 7 tasked
2025-11-19 19:40:04,021 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-87) tasked for imaging
2025-11-19 19:40:04,022 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-87) window enabled: 410.5 to 587.7
2025-11-19 19:40:04,023 sats.satellite.EO-1            INFO       <0.00> EO-1: setting timed terminal event at 587.7
2025-11-19 19:40:04,024 sats.satellite.EO-2            INFO       <0.00> EO-2: target index 9 tasked
2025-11-19 19:40:04,024 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-667) tasked for imaging
2025-11-19 19:40:04,025 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-667) window enabled: 907.1 to 1043.9
2025-11-19 19:40:04,025 sats.satellite.EO-2            INFO       <0.00> EO-2: setting timed terminal event at 1043.9
2025-11-19 19:40:04,026 sats.satellite.EO-3            INFO       <0.00> EO-3: target index 8 tasked
2025-11-19 19:40:04,027 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-336) tasked for imaging
2025-11-19 19:40:04,027 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-336) window enabled: 462.0 to 597.1
2025-11-19 19:40:04,028 sats.satellite.EO-3            INFO       <0.00> EO-3: setting timed terminal event at 597.1
2025-11-19 19:40:04,117 sats.satellite.EO-1            INFO       <413.00> EO-1: imaged Target(tgt-87)
2025-11-19 19:40:04,118 data.base                      INFO       <413.00> Total reward: {'EO-1': 0.3276583307015607}
2025-11-19 19:40:04,119 sats.satellite.EO-1            INFO       <413.00> EO-1: Satellite EO-1 requires retasking
2025-11-19 19:40:04,119 sats.satellite.EO-1            INFO       <413.00> EO-1: Finding opportunity windows from 1200.00 to 1800.00 seconds
2025-11-19 19:40:04,185 gym                            INFO       <413.00> Step reward: 0.3276583307015607
[6]:
observation
[6]:
(array([ 0.58427731, -0.00120939,  0.24140079,  0.04060442,  0.0250781 ,
         0.05462147,  0.74890347,  0.08024313,  0.38243619,  0.07966262,
         0.17232041,  0.09661991,  0.95106705,  0.08249307,  0.35250519,
         0.1327629 ,  0.78314087,  0.10567535,  0.04062939,  0.15113485]),
 array([ 0.03281195, -0.00586467,  0.85328538,  0.00617142,  0.29033318,
         0.04106011,  0.43530808,  0.04217314,  0.28150206,  0.05191862,
         0.39376621,  0.05807889,  0.39254123,  0.06266203,  0.18000863,
         0.075721  ,  0.34270875,  0.08668734,  0.91909931,  0.10106962]),
 array([ 0.30366652, -0.0048208 ,  0.54083216, -0.02136069,  0.34412046,
        -0.01766037,  0.65057345, -0.00878578,  0.29893233,  0.0168131 ,
         0.22241735,  0.0085914 ,  0.90081027,  0.00145725,  0.8725144 ,
         0.0020068 ,  0.59793886, -0.00313329,  0.86175301,  0.01705694]))

At this point, either every satellite can be retasked, or satellites can continue their previous action by passing None as the action. To see which satellites must be retasked (i.e. their previous action is done and they have nothing more to do), look at "requires_retasking" in each satellite’s info.

[7]:
info
[7]:
{'EO-1': {'requires_retasking': True},
 'EO-2': {'requires_retasking': False},
 'EO-3': {'requires_retasking': False},
 'd_ts': 413.0}

Based on this list, we decide here to only retask the satellite that needs it.

[8]:
actions = [0 if info[sat.name]["requires_retasking"] else None for sat in env.unwrapped.satellites]
actions
[8]:
[0, None, None]
[9]:
observation, reward, terminated, truncated, info = env.step(actions)
2025-11-19 19:40:04,207 gym                            INFO       <413.00> === STARTING STEP ===
2025-11-19 19:40:04,208 sats.satellite.EO-1            INFO       <413.00> EO-1: target index 0 tasked
2025-11-19 19:40:04,208 sats.satellite.EO-1            INFO       <413.00> EO-1: Target(tgt-486) tasked for imaging
2025-11-19 19:40:04,209 sats.satellite.EO-1            INFO       <413.00> EO-1: Target(tgt-486) window enabled: 406.1 to 596.5
2025-11-19 19:40:04,210 sats.satellite.EO-1            INFO       <413.00> EO-1: setting timed terminal event at 596.5
2025-11-19 19:40:04,222 sats.satellite.EO-1            INFO       <462.00> EO-1: imaged Target(tgt-486)
2025-11-19 19:40:04,223 data.base                      INFO       <462.00> Total reward: {'EO-1': 0.5842773069827059}
2025-11-19 19:40:04,224 sats.satellite.EO-1            INFO       <462.00> EO-1: Satellite EO-1 requires retasking
2025-11-19 19:40:04,226 gym                            INFO       <462.00> Step reward: 0.5842773069827059

In this environment, the environment will stop if any agent dies. To demonstrate this, one satellite is forcibly killed.

[10]:
from Basilisk.architecture import messaging

def isnt_alive(log_failure=False):
    """Mock satellite 0 dying."""
    self = env.unwrapped.satellites[0]
    death_message = messaging.PowerStorageStatusMsgPayload()
    death_message.storageLevel = 0.0
    self.dynamics.powerMonitor.batPowerOutMsg.write(death_message)
    return self.dynamics.is_alive(log_failure=log_failure) and self.fsw.is_alive(
        log_failure=log_failure
    )

env.unwrapped.satellites[0].is_alive = isnt_alive
observation, reward, terminated, truncated, info = env.step([6, 7, 9])

2025-11-19 19:40:04,233 gym                            INFO       <462.00> === STARTING STEP ===
2025-11-19 19:40:04,233 sats.satellite.EO-1            INFO       <462.00> EO-1: target index 6 tasked
2025-11-19 19:40:04,234 sats.satellite.EO-1            INFO       <462.00> EO-1: Target(tgt-142) tasked for imaging
2025-11-19 19:40:04,235 sats.satellite.EO-1            INFO       <462.00> EO-1: Target(tgt-142) window enabled: 1169.7 to 1350.3
2025-11-19 19:40:04,235 sats.satellite.EO-1            INFO       <462.00> EO-1: setting timed terminal event at 1350.3
2025-11-19 19:40:04,236 sats.satellite.EO-2            INFO       <462.00> EO-2: target index 7 tasked
2025-11-19 19:40:04,236 sats.satellite.EO-2            INFO       <462.00> EO-2: Target(tgt-676) tasked for imaging
2025-11-19 19:40:04,238 sats.satellite.EO-2            INFO       <462.00> EO-2: Target(tgt-676) window enabled: 844.6 to 1027.6
2025-11-19 19:40:04,238 sats.satellite.EO-2            INFO       <462.00> EO-2: setting timed terminal event at 1027.6
2025-11-19 19:40:04,239 sats.satellite.EO-3            INFO       <462.00> EO-3: target index 9 tasked
2025-11-19 19:40:04,240 sats.satellite.EO-3            INFO       <462.00> EO-3: Target(tgt-189) tasked for imaging
2025-11-19 19:40:04,241 sats.satellite.EO-3            INFO       <462.00> EO-3: Target(tgt-189) window enabled: 510.2 to 600.0
2025-11-19 19:40:04,241 sats.satellite.EO-3            INFO       <462.00> EO-3: setting timed terminal event at 600.0
2025-11-19 19:40:04,255 sats.satellite.EO-3            INFO       <522.00> EO-3: imaged Target(tgt-189)
2025-11-19 19:40:04,256 data.base                      INFO       <522.00> Total reward: {'EO-3': 0.8617530117573827}
2025-11-19 19:40:04,256 sats.satellite.EO-3            INFO       <522.00> EO-3: Satellite EO-3 requires retasking
2025-11-19 19:40:04,258 sats.satellite.EO-1            WARNING    <522.00> EO-1: failed battery_valid check
2025-11-19 19:40:04,259 gym                            INFO       <522.00> Step reward: -0.13824698824261727
2025-11-19 19:40:04,260 gym                            INFO       <522.00> Episode terminated: True
2025-11-19 19:40:04,260 gym                            INFO       <522.00> Episode truncated: False

PettingZoo API

The PettingZoo parallel API environment, ConstellationTasking, is largely the same as GeneralSatelliteTasking. See their documentation for a full description of the API. It tends to separate things into dictionaries keyed by agent, rather than tuples.

[11]:
from bsk_rl import ConstellationTasking

env = ConstellationTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_spaces
2025-11-19 19:40:04,268                                WARNING    Creating logger for new env on PID=4345. Old environments in process may now log times incorrectly.
2025-11-19 19:40:04,393 gym                            INFO       Resetting environment with seed=4100363403
2025-11-19 19:40:04,395 scene.targets                  INFO       Generating 1000 targets
2025-11-19 19:40:04,552 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2025-11-19 19:40:04,596 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2025-11-19 19:40:04,634 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-11-19 19:40:04,673 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2025-11-19 19:40:04,712 gym                            INFO       <0.00> Environment reset
[11]:
{'EO-1': Box(-1e+16, 1e+16, (20,), float64),
 'EO-2': Box(-1e+16, 1e+16, (20,), float64),
 'EO-3': Box(-1e+16, 1e+16, (20,), float64)}
[12]:
env.action_spaces
[12]:
{'EO-1': Discrete(10), 'EO-2': Discrete(10), 'EO-3': Discrete(10)}

Actions are passed as a dictionary; the agent names can be accessed through the agents property.

[13]:
observation, reward, terminated, truncated, info = env.step(
    {
        env.agents[0]: 7,
        env.agents[1]: 9,
        env.agents[2]: 8,
    }
)
2025-11-19 19:40:04,725 gym                            INFO       <0.00> === STARTING STEP ===
2025-11-19 19:40:04,726 sats.satellite.EO-1            INFO       <0.00> EO-1: target index 7 tasked
2025-11-19 19:40:04,726 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-852) tasked for imaging
2025-11-19 19:40:04,727 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-852) window enabled: 286.1 to 411.7
2025-11-19 19:40:04,727 sats.satellite.EO-1            INFO       <0.00> EO-1: setting timed terminal event at 411.7
2025-11-19 19:40:04,728 sats.satellite.EO-2            INFO       <0.00> EO-2: target index 9 tasked
2025-11-19 19:40:04,729 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-950) tasked for imaging
2025-11-19 19:40:04,729 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-950) window enabled: 601.3 to 808.1
2025-11-19 19:40:04,730 sats.satellite.EO-2            INFO       <0.00> EO-2: setting timed terminal event at 808.1
2025-11-19 19:40:04,731 sats.satellite.EO-3            INFO       <0.00> EO-3: target index 8 tasked
2025-11-19 19:40:04,732 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-716) tasked for imaging
2025-11-19 19:40:04,732 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-716) window enabled: 380.5 to 550.5
2025-11-19 19:40:04,733 sats.satellite.EO-3            INFO       <0.00> EO-3: setting timed terminal event at 550.5
2025-11-19 19:40:04,795 sats.satellite.EO-1            INFO       <289.00> EO-1: imaged Target(tgt-852)
2025-11-19 19:40:04,796 data.base                      INFO       <289.00> Total reward: {'EO-1': 0.2708989913511971}
2025-11-19 19:40:04,797 sats.satellite.EO-1            INFO       <289.00> EO-1: Satellite EO-1 requires retasking
2025-11-19 19:40:04,799 sats.satellite.EO-3            INFO       <289.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-11-19 19:40:04,837 gym                            INFO       <289.00> Step reward: {'EO-1': 0.2708989913511971}
[14]:
observation
[14]:
{'EO-1': array([ 9.74746973e-01, -3.31563562e-02,  1.18376987e-01, -2.81998295e-02,
         8.88759739e-01, -5.85852040e-03,  8.80114112e-01,  1.63428283e-02,
         2.13597203e-01, -1.22314895e-02,  8.34582279e-01, -1.88782490e-04,
         4.38237820e-01,  3.32958143e-03,  2.40970045e-01,  8.75483855e-03,
         7.29299995e-01,  7.67320850e-03,  5.29225587e-01,  9.11020725e-03]),
 'EO-2': array([ 0.09508472, -0.01565844,  0.88307151, -0.0112318 ,  0.83373292,
         0.05478385,  0.49890348,  0.06456441,  0.62449781,  0.09459119,
         0.72492498,  0.08063659,  0.88356957,  0.09625589,  0.40387205,
         0.09782956,  0.90064637,  0.11599226,  0.88528574,  0.12752772]),
 'EO-3': array([ 0.84198769, -0.02624412,  0.31969192, -0.02512481,  0.76253085,
        -0.02164455,  0.22407582, -0.00839274,  0.68797458,  0.01605513,
         0.58772462,  0.03759081,  0.72131422,  0.0406082 ,  0.38912889,
         0.02923103,  0.61861145,  0.03767165,  0.06869692,  0.05885124])}

Other than compatibility with MARL algorithms, the main benefit of the PettingZoo API is that it allows for individual agents to fail without terminating the entire environment.

[15]:
# Immediately kill satellite 0
env.unwrapped.satellites[0].is_alive = isnt_alive
env.agents
[15]:
['EO-1', 'EO-2', 'EO-3']
[16]:
observation, reward, terminated, truncated, info = env.step({
        env.agents[0]: 7,
        env.agents[1]: 9,
    }
)
2025-11-19 19:40:04,853 gym                            INFO       <289.00> === STARTING STEP ===
2025-11-19 19:40:04,854 sats.satellite.EO-1            INFO       <289.00> EO-1: target index 7 tasked
2025-11-19 19:40:04,855 sats.satellite.EO-1            INFO       <289.00> EO-1: Target(tgt-958) tasked for imaging
2025-11-19 19:40:04,856 sats.satellite.EO-1            INFO       <289.00> EO-1: Target(tgt-958) window enabled: 338.9 to 484.7
2025-11-19 19:40:04,856 sats.satellite.EO-1            INFO       <289.00> EO-1: setting timed terminal event at 484.7
2025-11-19 19:40:04,857 sats.satellite.EO-2            INFO       <289.00> EO-2: target index 9 tasked
2025-11-19 19:40:04,857 sats.satellite.EO-2            INFO       <289.00> EO-2: Target(tgt-189) tasked for imaging
2025-11-19 19:40:04,858 sats.satellite.EO-2            INFO       <289.00> EO-2: Target(tgt-189) window enabled: 1015.9 to 1197.0
2025-11-19 19:40:04,859 sats.satellite.EO-2            INFO       <289.00> EO-2: setting timed terminal event at 1197.0
2025-11-19 19:40:04,874 sats.satellite.EO-1            INFO       <349.00> EO-1: imaged Target(tgt-958)
2025-11-19 19:40:04,875 data.base                      INFO       <349.00> Total reward: {'EO-1': 0.2409700452561282}
2025-11-19 19:40:04,875 sats.satellite.EO-1            INFO       <349.00> EO-1: Satellite EO-1 requires retasking
2025-11-19 19:40:04,877 sats.satellite.EO-1            INFO       <349.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-11-19 19:40:04,917 gym                            INFO       <349.00> Step reward: {'EO-1': -0.7590299547438718}
2025-11-19 19:40:04,918 gym                            INFO       <349.00> Episode terminated: ['EO-1']