Multi-Agent Environments

Two multiagent environments are given in the package:

The latter is preferable for multi-agent RL (MARL) settings, as most algorithms are designed for this kind of API.

Configuring the Environment

For this example, a multisatellite target imaging environment will be used. The goal is to maximize the value of unique images taken.

As usual, the satellite type is defined first.

[1]:
from bsk_rl import sats, act, obs, scene, data, comm
from bsk_rl.sim import dyn, fsw

class ImagingSatellite(sats.ImagingSatellite):
    observation_spec = [
        obs.OpportunityProperties(
            dict(prop="priority"),
            dict(prop="opportunity_open", norm=5700.0),
            n_ahead_observe=10,
        )
    ]
    action_spec = [act.Image(n_ahead_image=10)]
    dyn_type = dyn.FullFeaturedDynModel
    fsw_type = fsw.SteeringImagerFSWModel

Satellite properties are set to give the satellite near-unlimited power and storage resources. To randomize some parameters in a correlated manner across satellites, a sat_arg_randomizer is set and passed to the environment. In this case, the satellites are distributed in a trivial single-plane Walker-delta constellation.

[2]:

from bsk_rl.utils.orbital import walker_delta_args sat_args = dict( imageAttErrorRequirement=0.01, imageRateErrorRequirement=0.01, batteryStorageCapacity=1e9, storedCharge_Init=1e9, dataStorageCapacity=1e12, u_max=0.4, K1=0.25, K3=3.0, omega_max=0.087, servo_Ki=5.0, servo_P=150 / 5, ) sat_arg_randomizer = walker_delta_args(altitude=800.0, inc=60.0, n_planes=1)

Gym API

GeneralSatelliteTasking uses tuples of actions and observations to interact with the environment.

[3]:
from bsk_rl import GeneralSatelliteTasking

env = GeneralSatelliteTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_space
2025-12-03 22:23:13,563 gym                            INFO       Resetting environment with seed=515624232
2025-12-03 22:23:13,566 scene.targets                  INFO       Generating 1000 targets
2025-12-03 22:23:13,735 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2025-12-03 22:23:13,781 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2025-12-03 22:23:13,823 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2025-12-03 22:23:13,857 gym                            INFO       <0.00> Environment reset
[3]:
Tuple(Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64))
[4]:
env.action_space
[4]:
Tuple(Discrete(10), Discrete(10), Discrete(10))

Consequently, actions are passed as a tuple. The step will stop the first time any satellite completes an action.

[5]:
observation, reward, terminated, truncated, info = env.step([7, 9, 8])
2025-12-03 22:23:13,871 gym                            INFO       <0.00> === STARTING STEP ===
2025-12-03 22:23:13,872 sats.satellite.EO-1            INFO       <0.00> EO-1: target index 7 tasked
2025-12-03 22:23:13,872 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-141) tasked for imaging
2025-12-03 22:23:13,873 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-141) window enabled: 256.1 to 461.4
2025-12-03 22:23:13,874 sats.satellite.EO-1            INFO       <0.00> EO-1: setting timed terminal event at 461.4
2025-12-03 22:23:13,875 sats.satellite.EO-2            INFO       <0.00> EO-2: target index 9 tasked
2025-12-03 22:23:13,875 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-633) tasked for imaging
2025-12-03 22:23:13,877 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-633) window enabled: 186.0 to 393.2
2025-12-03 22:23:13,877 sats.satellite.EO-2            INFO       <0.00> EO-2: setting timed terminal event at 393.2
2025-12-03 22:23:13,878 sats.satellite.EO-3            INFO       <0.00> EO-3: target index 8 tasked
2025-12-03 22:23:13,879 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-63) tasked for imaging
2025-12-03 22:23:13,880 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-63) window enabled: 506.9 to 600.0
2025-12-03 22:23:13,880 sats.satellite.EO-3            INFO       <0.00> EO-3: setting timed terminal event at 600.0
2025-12-03 22:23:13,920 sats.satellite.EO-2            INFO       <188.00> EO-2: imaged Target(tgt-633)
2025-12-03 22:23:13,921 data.base                      INFO       <188.00> Total reward: {'EO-2': 0.6549311444791328}
2025-12-03 22:23:13,921 sats.satellite.EO-2            INFO       <188.00> EO-2: Satellite EO-2 requires retasking
2025-12-03 22:23:13,923 sats.satellite.EO-3            INFO       <188.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-12-03 22:23:13,965 sats.access_satellite          WARNING    <188.00> initial_generation_duration is shorter than the maximum window length; some windows may be neglected.
2025-12-03 22:23:13,968 gym                            INFO       <188.00> Step reward: 0.6549311444791328
[6]:
observation
[6]:
(array([ 0.15999164, -0.02849177,  0.98221026, -0.02898674,  0.03140379,
         0.01216677,  0.68629755, -0.00504712,  0.77998552, -0.01560557,
         0.79314365,  0.01557904,  0.38686879,  0.01194923,  0.22736717,
         0.0185298 ,  0.50905583,  0.05912701,  0.52422303,  0.05114976]),
 array([ 0.69966823, -0.02002935,  0.75000477, -0.01508706,  0.27548715,
        -0.03129775,  0.20250629, -0.00375861,  0.40811223,  0.01063389,
         0.31548965,  0.01023244,  0.76420993,  0.00709782,  0.99434525,
         0.04578693,  0.16361287,  0.0486464 ,  0.31771701,  0.06349395]),
 array([0.3139193 , 0.01212284, 0.0949052 , 0.03401048, 0.89744615,
        0.05595508, 0.24729176, 0.05803803, 0.62928106, 0.06297189,
        0.55009047, 0.08974452, 0.23991074, 0.08617837, 0.89783777,
        0.10211413, 0.73509728, 0.1354319 , 0.6431375 , 0.16534245]))

At this point, either every satellite can be retasked, or satellites can continue their previous action by passing None as the action. To see which satellites must be retasked (i.e. their previous action is done and they have nothing more to do), look at "requires_retasking" in each satellite’s info.

[7]:
info
[7]:
{'EO-1': {'requires_retasking': False},
 'EO-2': {'requires_retasking': True},
 'EO-3': {'requires_retasking': False},
 'd_ts': 188.0}

Based on this list, we decide here to only retask the satellite that needs it.

[8]:
actions = [0 if info[sat.name]["requires_retasking"] else None for sat in env.unwrapped.satellites]
actions
[8]:
[None, 0, None]
[9]:
observation, reward, terminated, truncated, info = env.step(actions)
2025-12-03 22:23:13,989 gym                            INFO       <188.00> === STARTING STEP ===
2025-12-03 22:23:13,990 sats.satellite.EO-2            INFO       <188.00> EO-2: target index 0 tasked
2025-12-03 22:23:13,991 sats.satellite.EO-2            INFO       <188.00> EO-2: Target(tgt-280) tasked for imaging
2025-12-03 22:23:13,992 sats.satellite.EO-2            INFO       <188.00> EO-2: Target(tgt-280) window enabled: 73.8 to 193.1
2025-12-03 22:23:13,992 sats.satellite.EO-2            INFO       <188.00> EO-2: setting timed terminal event at 193.1
2025-12-03 22:23:13,996 sats.satellite.EO-2            INFO       <194.00> EO-2: timed termination at 193.1 for Target(tgt-280) window
2025-12-03 22:23:13,997 data.base                      INFO       <194.00> Total reward: {}
2025-12-03 22:23:13,997 sats.satellite.EO-2            INFO       <194.00> EO-2: Satellite EO-2 requires retasking
2025-12-03 22:23:14,000 gym                            INFO       <194.00> Step reward: 0.0

In this environment, the environment will stop if any agent dies. To demonstrate this, one satellite is forcibly killed.

[10]:
from Basilisk.architecture import messaging

def isnt_alive(log_failure=False):
    """Mock satellite 0 dying."""
    self = env.unwrapped.satellites[0]
    death_message = messaging.PowerStorageStatusMsgPayload()
    death_message.storageLevel = 0.0
    self.dynamics.powerMonitor.batPowerOutMsg.write(death_message)
    return self.dynamics.is_alive(log_failure=log_failure) and self.fsw.is_alive(
        log_failure=log_failure
    )

env.unwrapped.satellites[0].is_alive = isnt_alive
observation, reward, terminated, truncated, info = env.step([6, 7, 9])

2025-12-03 22:23:14,005 gym                            INFO       <194.00> === STARTING STEP ===
2025-12-03 22:23:14,006 sats.satellite.EO-1            INFO       <194.00> EO-1: target index 6 tasked
2025-12-03 22:23:14,007 sats.satellite.EO-1            INFO       <194.00> EO-1: Target(tgt-141) window enabled: 256.1 to 461.4
2025-12-03 22:23:14,007 sats.satellite.EO-1            INFO       <194.00> EO-1: setting timed terminal event at 461.4
2025-12-03 22:23:14,008 sats.satellite.EO-2            INFO       <194.00> EO-2: target index 7 tasked
2025-12-03 22:23:14,009 sats.satellite.EO-2            INFO       <194.00> EO-2: Target(tgt-274) tasked for imaging
2025-12-03 22:23:14,009 sats.satellite.EO-2            INFO       <194.00> EO-2: Target(tgt-274) window enabled: 465.3 to 599.3
2025-12-03 22:23:14,010 sats.satellite.EO-2            INFO       <194.00> EO-2: setting timed terminal event at 599.3
2025-12-03 22:23:14,010 sats.satellite.EO-3            INFO       <194.00> EO-3: target index 9 tasked
2025-12-03 22:23:14,011 sats.satellite.EO-3            INFO       <194.00> EO-3: Target(tgt-842) tasked for imaging
2025-12-03 22:23:14,012 sats.satellite.EO-3            INFO       <194.00> EO-3: Target(tgt-842) window enabled: 1130.5 to 1197.1
2025-12-03 22:23:14,012 sats.satellite.EO-3            INFO       <194.00> EO-3: setting timed terminal event at 1197.1
2025-12-03 22:23:14,030 sats.satellite.EO-1            INFO       <259.00> EO-1: imaged Target(tgt-141)
2025-12-03 22:23:14,031 data.base                      INFO       <259.00> Total reward: {'EO-1': 0.386868785309475}
2025-12-03 22:23:14,031 sats.satellite.EO-1            INFO       <259.00> EO-1: Satellite EO-1 requires retasking
2025-12-03 22:23:14,032 sats.satellite.EO-1            INFO       <259.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-12-03 22:23:14,071 sats.satellite.EO-2            INFO       <259.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-12-03 22:23:14,113 sats.satellite.EO-1            WARNING    <259.00> EO-1: failed battery_valid check
2025-12-03 22:23:14,114 gym                            INFO       <259.00> Step reward: -0.613131214690525
2025-12-03 22:23:14,115 gym                            INFO       <259.00> Episode terminated: True
2025-12-03 22:23:14,115 gym                            INFO       <259.00> Episode truncated: False

PettingZoo API

The PettingZoo parallel API environment, ConstellationTasking, is largely the same as GeneralSatelliteTasking. See their documentation for a full description of the API. It tends to separate things into dictionaries keyed by agent, rather than tuples.

[11]:
from bsk_rl import ConstellationTasking

env = ConstellationTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_spaces
2025-12-03 22:23:14,123                                WARNING    Creating logger for new env on PID=4294. Old environments in process may now log times incorrectly.
2025-12-03 22:23:14,227 gym                            INFO       Resetting environment with seed=1179595597
2025-12-03 22:23:14,229 scene.targets                  INFO       Generating 1000 targets
2025-12-03 22:23:14,384 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2025-12-03 22:23:14,426 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2025-12-03 22:23:14,466 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2025-12-03 22:23:14,506 gym                            INFO       <0.00> Environment reset
[11]:
{'EO-1': Box(-1e+16, 1e+16, (20,), float64),
 'EO-2': Box(-1e+16, 1e+16, (20,), float64),
 'EO-3': Box(-1e+16, 1e+16, (20,), float64)}
[12]:
env.action_spaces
[12]:
{'EO-1': Discrete(10), 'EO-2': Discrete(10), 'EO-3': Discrete(10)}

Actions are passed as a dictionary; the agent names can be accessed through the agents property.

[13]:
observation, reward, terminated, truncated, info = env.step(
    {
        env.agents[0]: 7,
        env.agents[1]: 9,
        env.agents[2]: 8,
    }
)
2025-12-03 22:23:14,518 gym                            INFO       <0.00> === STARTING STEP ===
2025-12-03 22:23:14,519 sats.satellite.EO-1            INFO       <0.00> EO-1: target index 7 tasked
2025-12-03 22:23:14,519 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-626) tasked for imaging
2025-12-03 22:23:14,520 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-626) window enabled: 224.4 to 374.7
2025-12-03 22:23:14,521 sats.satellite.EO-1            INFO       <0.00> EO-1: setting timed terminal event at 374.7
2025-12-03 22:23:14,522 sats.satellite.EO-2            INFO       <0.00> EO-2: target index 9 tasked
2025-12-03 22:23:14,522 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-709) tasked for imaging
2025-12-03 22:23:14,524 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-709) window enabled: 378.3 to 526.7
2025-12-03 22:23:14,524 sats.satellite.EO-2            INFO       <0.00> EO-2: setting timed terminal event at 526.7
2025-12-03 22:23:14,525 sats.satellite.EO-3            INFO       <0.00> EO-3: target index 8 tasked
2025-12-03 22:23:14,525 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-585) tasked for imaging
2025-12-03 22:23:14,526 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-585) window enabled: 226.1 to 419.4
2025-12-03 22:23:14,526 sats.satellite.EO-3            INFO       <0.00> EO-3: setting timed terminal event at 419.4
2025-12-03 22:23:14,578 sats.satellite.EO-1            INFO       <227.00> EO-1: imaged Target(tgt-626)
2025-12-03 22:23:14,579 data.base                      INFO       <227.00> Total reward: {'EO-1': 0.1144629147895686}
2025-12-03 22:23:14,580 sats.satellite.EO-1            INFO       <227.00> EO-1: Satellite EO-1 requires retasking
2025-12-03 22:23:14,582 sats.satellite.EO-3            INFO       <227.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-12-03 22:23:14,626 gym                            INFO       <227.00> Step reward: {'EO-1': 0.1144629147895686}
[14]:
observation
[14]:
{'EO-1': array([ 0.07189579, -0.01483751,  0.23255507, -0.03061893,  0.50890416,
        -0.01038893,  0.56293949, -0.00186763,  0.75887705,  0.0132369 ,
         0.47365826,  0.00860483,  0.31216624,  0.00678843,  0.95412528,
         0.02299343,  0.62137268,  0.02199646,  0.04388798,  0.03007908]),
 'EO-2': array([ 3.66160025e-01, -3.67282269e-03,  9.22112538e-01, -2.05308721e-02,
         2.39032736e-01, -2.28982066e-02,  4.91996670e-01, -2.03244832e-02,
         3.16487420e-01, -4.27624845e-04,  8.85069028e-01,  1.17551299e-02,
         8.53054754e-01,  2.65509980e-02,  2.49670324e-01,  2.61489948e-02,
         9.84681461e-01,  5.83147360e-02,  9.00128436e-01,  5.34884411e-02]),
 'EO-3': array([ 5.80984430e-01, -2.77058878e-02,  1.50499606e-01, -8.79549320e-03,
         3.76049394e-01, -1.61303367e-04,  5.00481851e-01,  1.49813450e-02,
         3.15085500e-01,  2.04100679e-02,  9.14580972e-01,  1.79115373e-02,
         3.56357448e-01,  3.31466622e-02,  4.81854516e-01,  4.42189764e-02,
         5.37039480e-01,  5.90957489e-02,  5.00813915e-01,  8.14912476e-02])}

Other than compatibility with MARL algorithms, the main benefit of the PettingZoo API is that it allows for individual agents to fail without terminating the entire environment.

[15]:
# Immediately kill satellite 0
env.unwrapped.satellites[0].is_alive = isnt_alive
env.agents
[15]:
['EO-1', 'EO-2', 'EO-3']
[16]:
observation, reward, terminated, truncated, info = env.step({
        env.agents[0]: 7,
        env.agents[1]: 9,
    }
)
2025-12-03 22:23:14,642 gym                            INFO       <227.00> === STARTING STEP ===
2025-12-03 22:23:14,643 sats.satellite.EO-1            INFO       <227.00> EO-1: target index 7 tasked
2025-12-03 22:23:14,643 sats.satellite.EO-1            INFO       <227.00> EO-1: Target(tgt-261) tasked for imaging
2025-12-03 22:23:14,644 sats.satellite.EO-1            INFO       <227.00> EO-1: Target(tgt-261) window enabled: 358.1 to 502.4
2025-12-03 22:23:14,644 sats.satellite.EO-1            INFO       <227.00> EO-1: setting timed terminal event at 502.4
2025-12-03 22:23:14,646 sats.satellite.EO-2            INFO       <227.00> EO-2: target index 9 tasked
2025-12-03 22:23:14,647 sats.satellite.EO-2            INFO       <227.00> EO-2: Target(tgt-742) tasked for imaging
2025-12-03 22:23:14,647 sats.satellite.EO-2            INFO       <227.00> EO-2: Target(tgt-742) window enabled: 531.9 to 600.0
2025-12-03 22:23:14,648 sats.satellite.EO-2            INFO       <227.00> EO-2: setting timed terminal event at 600.0
2025-12-03 22:23:14,650 sats.satellite.EO-3            INFO       <229.00> EO-3: imaged Target(tgt-585)
2025-12-03 22:23:14,651 data.base                      INFO       <229.00> Total reward: {'EO-3': 0.37604939364190526}
2025-12-03 22:23:14,651 sats.satellite.EO-3            INFO       <229.00> EO-3: Satellite EO-3 requires retasking
2025-12-03 22:23:14,654 gym                            INFO       <229.00> Step reward: {'EO-1': -1.0, 'EO-3': 0.37604939364190526}
2025-12-03 22:23:14,655 gym                            INFO       <229.00> Episode terminated: ['EO-1']