Multi-Agent Environments

Two multiagent environments are given in the package:

The latter is preferable for multi-agent RL (MARL) settings, as most algorithms are designed for this kind of API.

Configuring the Environment

For this example, a multisatellite target imaging environment will be used. The goal is to maximize the value of unique images taken.

As usual, the satellite type is defined first.

[1]:
from bsk_rl import sats, act, obs, scene, data, comm
from bsk_rl.sim import dyn, fsw

class ImagingSatellite(sats.ImagingSatellite):
    observation_spec = [
        obs.OpportunityProperties(
            dict(prop="priority"),
            dict(prop="opportunity_open", norm=5700.0),
            n_ahead_observe=10,
        )
    ]
    action_spec = [act.Image(n_ahead_image=10)]
    dyn_type = dyn.FullFeaturedDynModel
    fsw_type = fsw.SteeringImagerFSWModel

Satellite properties are set to give the satellite near-unlimited power and storage resources. To randomize some parameters in a correlated manner across satellites, a sat_arg_randomizer is set and passed to the environment. In this case, the satellites are distributed in a trivial single-plane Walker-delta constellation.

[2]:

from bsk_rl.utils.orbital import walker_delta_args sat_args = dict( imageAttErrorRequirement=0.01, imageRateErrorRequirement=0.01, batteryStorageCapacity=1e9, storedCharge_Init=1e9, dataStorageCapacity=1e12, u_max=0.4, K1=0.25, K3=3.0, omega_max=0.087, servo_Ki=5.0, servo_P=150 / 5, ) sat_arg_randomizer = walker_delta_args(altitude=800.0, inc=60.0, n_planes=1)

Gym API

GeneralSatelliteTasking uses tuples of actions and observations to interact with the environment.

[3]:
from bsk_rl import GeneralSatelliteTasking

env = GeneralSatelliteTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_space
2026-05-19 20:29:30,227 gym                            INFO       Resetting environment with seed=986115960
2026-05-19 20:29:30,230 scene.targets                  INFO       Generating 1000 targets
2026-05-19 20:29:30,298 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2026-05-19 20:29:30,332 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2026-05-19 20:29:30,367 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2026-05-19 20:29:30,403 gym                            INFO       <0.00> Environment reset
[3]:
Tuple(Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64))
[4]:
env.action_space
[4]:
Tuple(Discrete(10), Discrete(10), Discrete(10))

Consequently, actions are passed as a tuple. The step will stop the first time any satellite completes an action.

[5]:
observation, reward, terminated, truncated, info = env.step([7, 9, 8])
2026-05-19 20:29:30,415 gym                            INFO       <0.00> === STARTING STEP ===
2026-05-19 20:29:30,416 sats.satellite.EO-1            INFO       <0.00> EO-1: target index 7 tasked
2026-05-19 20:29:30,416 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-259) tasked for imaging
2026-05-19 20:29:30,417 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-259) window enabled: 193.8 to 400.5
2026-05-19 20:29:30,418 sats.satellite.EO-1            INFO       <0.00> EO-1: setting timed terminal event at 400.5
2026-05-19 20:29:30,419 sats.satellite.EO-2            INFO       <0.00> EO-2: target index 9 tasked
2026-05-19 20:29:30,419 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-630) tasked for imaging
2026-05-19 20:29:30,420 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-630) window enabled: 274.1 to 432.4
2026-05-19 20:29:30,421 sats.satellite.EO-2            INFO       <0.00> EO-2: setting timed terminal event at 432.4
2026-05-19 20:29:30,421 sats.satellite.EO-3            INFO       <0.00> EO-3: target index 8 tasked
2026-05-19 20:29:30,422 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-541) tasked for imaging
2026-05-19 20:29:30,423 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-541) window enabled: 400.5 to 581.6
2026-05-19 20:29:30,423 sats.satellite.EO-3            INFO       <0.00> EO-3: setting timed terminal event at 581.6
2026-05-19 20:29:30,474 sats.satellite.EO-1            INFO       <196.00> EO-1: imaged Target(tgt-259)
2026-05-19 20:29:30,475 data.base                      INFO       <196.00> Total reward: {'EO-1': 0.3953655843250563}
2026-05-19 20:29:30,475 sats.satellite.EO-1            INFO       <196.00> EO-1: Satellite EO-1 requires retasking
2026-05-19 20:29:30,476 sats.satellite.EO-1            INFO       <196.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2026-05-19 20:29:30,516 gym                            INFO       <196.00> Step reward: 0.3953655843250563
[6]:
observation
[6]:
(array([ 0.72534076, -0.02489984,  0.26884405, -0.01628331,  0.86900513,
        -0.01020251,  0.66039315,  0.0302466 ,  0.64537845,  0.06924374,
         0.73550828,  0.06478752,  0.20670747,  0.0790755 ,  0.78172605,
         0.07706475,  0.87444427,  0.09055819,  0.34683973,  0.08514924]),
 array([ 0.32339267, -0.01779771,  0.63288115, -0.00684584,  0.40888213,
        -0.00439092,  0.74739081,  0.01610123,  0.4285398 ,  0.01369686,
         0.49678816,  0.00730153,  0.2863464 ,  0.0101031 ,  0.17725107,
         0.02201616,  0.65787583,  0.04882878,  0.71440445,  0.05597138]),
 array([ 0.16463002, -0.00519807,  0.11733033,  0.00695053,  0.08092027,
         0.01097047,  0.82478401,  0.02482425,  0.708638  ,  0.01243765,
         0.23996745,  0.01339228,  0.37748072,  0.03587805,  0.16080077,
         0.06303182,  0.61538618,  0.04830295,  0.42336447,  0.05430929]))

At this point, either every satellite can be retasked, or satellites can continue their previous action by passing None as the action. To see which satellites must be retasked (i.e. their previous action is done and they have nothing more to do), look at "requires_retasking" in each satellite’s info.

[7]:
info
[7]:
{'EO-1': {'requires_retasking': True},
 'EO-2': {'requires_retasking': False},
 'EO-3': {'requires_retasking': False},
 'd_ts': 196.0}

Based on this list, we decide here to only retask the satellite that needs it.

[8]:
actions = [0 if info[sat.name]["requires_retasking"] else None for sat in env.unwrapped.satellites]
actions
[8]:
[0, None, None]
[9]:
observation, reward, terminated, truncated, info = env.step(actions)
2026-05-19 20:29:30,536 gym                            INFO       <196.00> === STARTING STEP ===
2026-05-19 20:29:30,537 sats.satellite.EO-1            INFO       <196.00> EO-1: target index 0 tasked
2026-05-19 20:29:30,537 sats.satellite.EO-1            INFO       <196.00> EO-1: Target(tgt-44) tasked for imaging
2026-05-19 20:29:30,538 sats.satellite.EO-1            INFO       <196.00> EO-1: Target(tgt-44) window enabled: 54.1 to 223.1
2026-05-19 20:29:30,538 sats.satellite.EO-1            INFO       <196.00> EO-1: setting timed terminal event at 223.1
2026-05-19 20:29:30,547 sats.satellite.EO-1            INFO       <224.00> EO-1: timed termination at 223.1 for Target(tgt-44) window
2026-05-19 20:29:30,548 data.base                      INFO       <224.00> Total reward: {}
2026-05-19 20:29:30,548 sats.satellite.EO-1            INFO       <224.00> EO-1: Satellite EO-1 requires retasking
2026-05-19 20:29:30,551 gym                            INFO       <224.00> Step reward: 0.0

In this environment, the environment will stop if any agent dies. To demonstrate this, one satellite is forcibly killed.

[10]:
from Basilisk.architecture import messaging

def isnt_alive(log_failure=False):
    """Mock satellite 0 dying."""
    self = env.unwrapped.satellites[0]
    death_message = messaging.PowerStorageStatusMsgPayload()
    death_message.storageLevel = 0.0
    self.dynamics.powerMonitor.batPowerOutMsg.write(death_message)
    return self.dynamics.is_alive(log_failure=log_failure) and self.fsw.is_alive(
        log_failure=log_failure
    )

env.unwrapped.satellites[0].is_alive = isnt_alive
observation, reward, terminated, truncated, info = env.step([6, 7, 9])

2026-05-19 20:29:30,556 gym                            INFO       <224.00> === STARTING STEP ===
2026-05-19 20:29:30,557 sats.satellite.EO-1            INFO       <224.00> EO-1: target index 6 tasked
2026-05-19 20:29:30,557 sats.satellite.EO-1            INFO       <224.00> EO-1: Target(tgt-924) tasked for imaging
2026-05-19 20:29:30,558 sats.satellite.EO-1            INFO       <224.00> EO-1: Target(tgt-924) window enabled: 635.3 to 762.4
2026-05-19 20:29:30,558 sats.satellite.EO-1            INFO       <224.00> EO-1: setting timed terminal event at 762.4
2026-05-19 20:29:30,559 sats.satellite.EO-2            INFO       <224.00> EO-2: target index 7 tasked
2026-05-19 20:29:30,559 sats.satellite.EO-2            INFO       <224.00> EO-2: Target(tgt-79) tasked for imaging
2026-05-19 20:29:30,560 sats.satellite.EO-2            INFO       <224.00> EO-2: Target(tgt-79) window enabled: 321.5 to 524.7
2026-05-19 20:29:30,560 sats.satellite.EO-2            INFO       <224.00> EO-2: setting timed terminal event at 524.7
2026-05-19 20:29:30,561 sats.satellite.EO-3            INFO       <224.00> EO-3: target index 9 tasked
2026-05-19 20:29:30,562 sats.satellite.EO-3            INFO       <224.00> EO-3: Target(tgt-658) tasked for imaging
2026-05-19 20:29:30,562 sats.satellite.EO-3            INFO       <224.00> EO-3: Target(tgt-658) window enabled: 505.6 to 600.0
2026-05-19 20:29:30,563 sats.satellite.EO-3            INFO       <224.00> EO-3: setting timed terminal event at 600.0
2026-05-19 20:29:30,589 sats.satellite.EO-2            INFO       <324.00> EO-2: imaged Target(tgt-79)
2026-05-19 20:29:30,590 data.base                      INFO       <324.00> Total reward: {'EO-2': 0.17725106982234673}
2026-05-19 20:29:30,590 sats.satellite.EO-2            INFO       <324.00> EO-2: Satellite EO-2 requires retasking
2026-05-19 20:29:30,591 sats.satellite.EO-2            INFO       <324.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2026-05-19 20:29:30,643 sats.satellite.EO-1            WARNING    <324.00> EO-1: failed battery_valid check
2026-05-19 20:29:30,644 gym                            INFO       <324.00> Step reward: -0.8227489301776533
2026-05-19 20:29:30,644 gym                            INFO       <324.00> Episode terminated: True
2026-05-19 20:29:30,645 gym                            INFO       <324.00> Episode truncated: False

PettingZoo API

The PettingZoo parallel API environment, ConstellationTasking, is largely the same as GeneralSatelliteTasking. See their documentation for a full description of the API. It tends to separate things into dictionaries keyed by agent, rather than tuples.

[11]:
from bsk_rl import ConstellationTasking

env = ConstellationTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_spaces
2026-05-19 20:29:30,652                                WARNING    Creating logger for new env on PID=4395. Old environments in process may now log times incorrectly.
2026-05-19 20:29:30,654 gym                            INFO       Resetting environment with seed=1412607029
2026-05-19 20:29:30,656 scene.targets                  INFO       Generating 1000 targets
2026-05-19 20:29:30,689 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2026-05-19 20:29:30,726 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2026-05-19 20:29:30,762 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2026-05-19 20:29:30,803 gym                            INFO       <0.00> Environment reset
[11]:
{'EO-1': Box(-1e+16, 1e+16, (20,), float64),
 'EO-2': Box(-1e+16, 1e+16, (20,), float64),
 'EO-3': Box(-1e+16, 1e+16, (20,), float64)}
[12]:
env.action_spaces
[12]:
{'EO-1': Discrete(10), 'EO-2': Discrete(10), 'EO-3': Discrete(10)}

Actions are passed as a dictionary; the agent names can be accessed through the agents property.

[13]:
observation, reward, terminated, truncated, info = env.step(
    {
        env.agents[0]: 7,
        env.agents[1]: 9,
        env.agents[2]: 8,
    }
)
2026-05-19 20:29:30,814 gym                            INFO       <0.00> === STARTING STEP ===
2026-05-19 20:29:30,814 sats.satellite.EO-1            INFO       <0.00> EO-1: target index 7 tasked
2026-05-19 20:29:30,814 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-480) tasked for imaging
2026-05-19 20:29:30,815 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-480) window enabled: 90.2 to 236.7
2026-05-19 20:29:30,816 sats.satellite.EO-1            INFO       <0.00> EO-1: setting timed terminal event at 236.7
2026-05-19 20:29:30,816 sats.satellite.EO-2            INFO       <0.00> EO-2: target index 9 tasked
2026-05-19 20:29:30,817 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-676) tasked for imaging
2026-05-19 20:29:30,817 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-676) window enabled: 299.9 to 461.9
2026-05-19 20:29:30,818 sats.satellite.EO-2            INFO       <0.00> EO-2: setting timed terminal event at 461.9
2026-05-19 20:29:30,819 sats.satellite.EO-3            INFO       <0.00> EO-3: target index 8 tasked
2026-05-19 20:29:30,819 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-195) tasked for imaging
2026-05-19 20:29:30,820 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-195) window enabled: 131.7 to 332.7
2026-05-19 20:29:30,820 sats.satellite.EO-3            INFO       <0.00> EO-3: setting timed terminal event at 332.7
2026-05-19 20:29:30,840 sats.satellite.EO-1            INFO       <93.00> EO-1: imaged Target(tgt-480)
2026-05-19 20:29:30,841 data.base                      INFO       <93.00> Total reward: {'EO-1': 0.7060713096245507}
2026-05-19 20:29:30,841 sats.satellite.EO-1            INFO       <93.00> EO-1: Satellite EO-1 requires retasking
2026-05-19 20:29:30,845 gym                            INFO       <93.00> Step reward: {'EO-1': 0.7060713096245507}
[14]:
observation
[14]:
{'EO-1': array([ 0.89440326, -0.01631579,  0.27299245, -0.01077397,  0.66186383,
        -0.01631579,  0.84700055, -0.01631579,  0.05380567, -0.01631579,
         0.1727596 ,  0.00579201,  0.38227638,  0.02283473,  0.74101799,
         0.0194658 ,  0.0456438 ,  0.05247228,  0.71245699,  0.08077344]),
 'EO-2': array([ 0.99443799, -0.00802161,  0.09877264,  0.00389612,  0.04392426,
         0.01743539,  0.26346661,  0.02031343,  0.56174605,  0.05154918,
         0.50189443,  0.0282038 ,  0.4225443 ,  0.03837597,  0.31099963,
         0.03630247,  0.6447967 ,  0.02892887,  0.2905543 ,  0.06153436]),
 'EO-3': array([ 0.69747148, -0.01143501,  0.72105094, -0.01631579,  0.96783275,
         0.00309007,  0.10655504, -0.01631579,  0.189789  , -0.00632669,
         0.38362972,  0.00280303,  0.52188217,  0.00678178,  0.03899653,
         0.00812659,  0.58447659,  0.01703968,  0.88086894,  0.01099811])}

Other than compatibility with MARL algorithms, the main benefit of the PettingZoo API is that it allows for individual agents to fail without terminating the entire environment.

[15]:
# Immediately kill satellite 0
env.unwrapped.satellites[0].is_alive = isnt_alive
env.agents
[15]:
['EO-1', 'EO-2', 'EO-3']
[16]:
observation, reward, terminated, truncated, info = env.step({
        env.agents[0]: 7,
        env.agents[1]: 9,
    }
)
2026-05-19 20:29:30,860 gym                            INFO       <93.00> === STARTING STEP ===
2026-05-19 20:29:30,861 sats.satellite.EO-1            INFO       <93.00> EO-1: target index 7 tasked
2026-05-19 20:29:30,861 sats.satellite.EO-1            INFO       <93.00> EO-1: Target(tgt-909) tasked for imaging
2026-05-19 20:29:30,862 sats.satellite.EO-1            INFO       <93.00> EO-1: Target(tgt-909) window enabled: 204.0 to 405.0
2026-05-19 20:29:30,862 sats.satellite.EO-1            INFO       <93.00> EO-1: setting timed terminal event at 405.0
2026-05-19 20:29:30,863 sats.satellite.EO-2            INFO       <93.00> EO-2: target index 9 tasked
2026-05-19 20:29:30,864 sats.satellite.EO-2            INFO       <93.00> EO-2: Target(tgt-50) tasked for imaging
2026-05-19 20:29:30,864 sats.satellite.EO-2            INFO       <93.00> EO-2: Target(tgt-50) window enabled: 443.7 to 600.0
2026-05-19 20:29:30,865 sats.satellite.EO-2            INFO       <93.00> EO-2: setting timed terminal event at 600.0
2026-05-19 20:29:30,875 sats.satellite.EO-3            INFO       <134.00> EO-3: imaged Target(tgt-195)
2026-05-19 20:29:30,875 data.base                      INFO       <134.00> Total reward: {'EO-3': 0.5218821736595285}
2026-05-19 20:29:30,876 sats.satellite.EO-3            INFO       <134.00> EO-3: Satellite EO-3 requires retasking
2026-05-19 20:29:30,878 sats.satellite.EO-1            INFO       <134.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2026-05-19 20:29:30,934 gym                            INFO       <134.00> Step reward: {'EO-1': -1.0, 'EO-3': 0.5218821736595285}
2026-05-19 20:29:30,934 gym                            INFO       <134.00> Episode terminated: ['EO-1']