Multi-Agent Environments
Two multiagent environments are given in the package:
GeneralSatelliteTasking, a Gymnasium-based environment and the basis for all other environments.
ConstellationTasking, which implements the PettingZoo parallel API.
The latter is preferable for multi-agent RL (MARL) settings, as most algorithms are designed for this kind of API.
Configuring the Environment
For this example, a multisatellite target imaging environment will be used. The goal is to maximize the value of unique images taken.
As usual, the satellite type is defined first.
[1]:
from bsk_rl import sats, act, obs, scene, data, comm
from bsk_rl.sim import dyn, fsw
class ImagingSatellite(sats.ImagingSatellite):
observation_spec = [
obs.OpportunityProperties(
dict(prop="priority"),
dict(prop="opportunity_open", norm=5700.0),
n_ahead_observe=10,
)
]
action_spec = [act.Image(n_ahead_image=10)]
dyn_type = dyn.FullFeaturedDynModel
fsw_type = fsw.SteeringImagerFSWModel
Satellite properties are set to give the satellite near-unlimited power and storage resources. To randomize some parameters in a correlated manner across satellites, a sat_arg_randomizer is set and passed to the environment. In this case, the satellites are distributed in a trivial single-plane Walker-delta constellation.
[2]:
from bsk_rl.utils.orbital import walker_delta_args
sat_args = dict(
imageAttErrorRequirement=0.01,
imageRateErrorRequirement=0.01,
batteryStorageCapacity=1e9,
storedCharge_Init=1e9,
dataStorageCapacity=1e12,
u_max=0.4,
K1=0.25,
K3=3.0,
omega_max=0.087,
servo_Ki=5.0,
servo_P=150 / 5,
)
sat_arg_randomizer = walker_delta_args(altitude=800.0, inc=60.0, n_planes=1)
Gym API
GeneralSatelliteTasking uses tuples of actions and observations to interact with the environment.
[3]:
from bsk_rl import GeneralSatelliteTasking
env = GeneralSatelliteTasking(
satellites=[
ImagingSatellite("EO-1", sat_args),
ImagingSatellite("EO-2", sat_args),
ImagingSatellite("EO-3", sat_args),
],
scenario=scene.UniformTargets(1000),
rewarder=data.UniqueImageReward(),
communicator=comm.LOSCommunication(), # Note that dyn must inherit from LOSCommunication
sat_arg_randomizer=sat_arg_randomizer,
log_level="INFO",
)
env.reset()
env.observation_space
2026-02-25 01:07:20,865 gym INFO Resetting environment with seed=4233375989
2026-02-25 01:07:20,868 scene.targets INFO Generating 1000 targets
2026-02-25 01:07:21,017 sats.satellite.EO-1 INFO <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2026-02-25 01:07:21,055 sats.satellite.EO-1 INFO <0.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2026-02-25 01:07:21,116 sats.satellite.EO-2 INFO <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2026-02-25 01:07:21,152 sats.satellite.EO-2 INFO <0.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2026-02-25 01:07:21,194 sats.satellite.EO-3 INFO <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2026-02-25 01:07:21,226 sats.satellite.EO-3 INFO <0.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2026-02-25 01:07:21,265 gym INFO <0.00> Environment reset
[3]:
Tuple(Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64))
[4]:
env.action_space
[4]:
Tuple(Discrete(10), Discrete(10), Discrete(10))
Consequently, actions are passed as a tuple. The step will stop the first time any satellite completes an action.
[5]:
observation, reward, terminated, truncated, info = env.step([7, 9, 8])
2026-02-25 01:07:21,279 gym INFO <0.00> === STARTING STEP ===
2026-02-25 01:07:21,280 sats.satellite.EO-1 INFO <0.00> EO-1: target index 7 tasked
2026-02-25 01:07:21,281 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-14) tasked for imaging
2026-02-25 01:07:21,281 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-14) window enabled: 520.1 to 714.2
2026-02-25 01:07:21,282 sats.satellite.EO-1 INFO <0.00> EO-1: setting timed terminal event at 714.2
2026-02-25 01:07:21,283 sats.satellite.EO-2 INFO <0.00> EO-2: target index 9 tasked
2026-02-25 01:07:21,284 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-960) tasked for imaging
2026-02-25 01:07:21,284 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-960) window enabled: 782.0 to 981.6
2026-02-25 01:07:21,285 sats.satellite.EO-2 INFO <0.00> EO-2: setting timed terminal event at 981.6
2026-02-25 01:07:21,286 sats.satellite.EO-3 INFO <0.00> EO-3: target index 8 tasked
2026-02-25 01:07:21,287 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-881) tasked for imaging
2026-02-25 01:07:21,287 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-881) window enabled: 872.6 to 1080.0
2026-02-25 01:07:21,288 sats.satellite.EO-3 INFO <0.00> EO-3: setting timed terminal event at 1080.0
2026-02-25 01:07:21,402 sats.satellite.EO-1 INFO <523.00> EO-1: imaged Target(tgt-14)
2026-02-25 01:07:21,403 data.base INFO <523.00> Total reward: {'EO-1': 0.34673414424771276}
2026-02-25 01:07:21,403 sats.satellite.EO-1 INFO <523.00> EO-1: Satellite EO-1 requires retasking
2026-02-25 01:07:21,405 sats.satellite.EO-3 INFO <523.00> EO-3: Finding opportunity windows from 1200.00 to 1800.00 seconds
2026-02-25 01:07:21,450 gym INFO <523.00> Step reward: 0.34673414424771276
[6]:
observation
[6]:
(array([ 0.36089961, -0.0097861 , 0.28295936, 0.02153546, 0.46031924,
0.03205483, 0.43809311, 0.04386556, 0.39803192, 0.05255721,
0.62898963, 0.06052263, 0.62137229, 0.04069113, 0.71246305,
0.05438356, 0.59673673, 0.05630331, 0.06398604, 0.0584411 ]),
array([ 0.80439199, -0.02156333, 0.4911117 , -0.01264564, 0.05554125,
0.02603133, 0.74413301, 0.01359147, 0.17670795, 0.03572282,
0.62658804, 0.04543466, 0.80160002, 0.05421977, 0.00193374,
0.07346089, 0.95072509, 0.0710411 , 0.99278363, 0.08564298]),
array([ 0.29916333, -0.01693327, 0.11943385, 0.00318661, 0.08374258,
0.06189349, 0.85314552, 0.07385305, 0.69898258, 0.05945733,
0.06180556, 0.06133567, 0.76812529, 0.07020479, 0.23407476,
0.07885335, 0.00969239, 0.09445883, 0.77859104, 0.13510183]))
At this point, either every satellite can be retasked, or satellites can continue their previous action by passing None as the action. To see which satellites must be retasked (i.e. their previous action is done and they have nothing more to do), look at "requires_retasking" in each satellite’s info.
[7]:
info
[7]:
{'EO-1': {'requires_retasking': True},
'EO-2': {'requires_retasking': False},
'EO-3': {'requires_retasking': False},
'd_ts': 523.0}
Based on this list, we decide here to only retask the satellite that needs it.
[8]:
actions = [0 if info[sat.name]["requires_retasking"] else None for sat in env.unwrapped.satellites]
actions
[8]:
[0, None, None]
[9]:
observation, reward, terminated, truncated, info = env.step(actions)
2026-02-25 01:07:21,471 gym INFO <523.00> === STARTING STEP ===
2026-02-25 01:07:21,472 sats.satellite.EO-1 INFO <523.00> EO-1: target index 0 tasked
2026-02-25 01:07:21,473 sats.satellite.EO-1 INFO <523.00> EO-1: Target(tgt-947) tasked for imaging
2026-02-25 01:07:21,473 sats.satellite.EO-1 INFO <523.00> EO-1: Target(tgt-947) window enabled: 467.2 to 675.0
2026-02-25 01:07:21,474 sats.satellite.EO-1 INFO <523.00> EO-1: setting timed terminal event at 675.0
2026-02-25 01:07:21,484 sats.satellite.EO-1 INFO <559.00> EO-1: imaged Target(tgt-947)
2026-02-25 01:07:21,484 data.base INFO <559.00> Total reward: {'EO-1': 0.36089960599901516}
2026-02-25 01:07:21,485 sats.satellite.EO-1 INFO <559.00> EO-1: Satellite EO-1 requires retasking
2026-02-25 01:07:21,488 gym INFO <559.00> Step reward: 0.36089960599901516
In this environment, the environment will stop if any agent dies. To demonstrate this, one satellite is forcibly killed.
[10]:
from Basilisk.architecture import messaging
def isnt_alive(log_failure=False):
"""Mock satellite 0 dying."""
self = env.unwrapped.satellites[0]
death_message = messaging.PowerStorageStatusMsgPayload()
death_message.storageLevel = 0.0
self.dynamics.powerMonitor.batPowerOutMsg.write(death_message)
return self.dynamics.is_alive(log_failure=log_failure) and self.fsw.is_alive(
log_failure=log_failure
)
env.unwrapped.satellites[0].is_alive = isnt_alive
observation, reward, terminated, truncated, info = env.step([6, 7, 9])
2026-02-25 01:07:21,494 gym INFO <559.00> === STARTING STEP ===
2026-02-25 01:07:21,495 sats.satellite.EO-1 INFO <559.00> EO-1: target index 6 tasked
2026-02-25 01:07:21,496 sats.satellite.EO-1 INFO <559.00> EO-1: Target(tgt-506) tasked for imaging
2026-02-25 01:07:21,497 sats.satellite.EO-1 INFO <559.00> EO-1: Target(tgt-506) window enabled: 833.0 to 1041.0
2026-02-25 01:07:21,497 sats.satellite.EO-1 INFO <559.00> EO-1: setting timed terminal event at 1041.0
2026-02-25 01:07:21,498 sats.satellite.EO-2 INFO <559.00> EO-2: target index 7 tasked
2026-02-25 01:07:21,498 sats.satellite.EO-2 INFO <559.00> EO-2: Target(tgt-245) tasked for imaging
2026-02-25 01:07:21,499 sats.satellite.EO-2 INFO <559.00> EO-2: Target(tgt-245) window enabled: 927.9 to 1129.5
2026-02-25 01:07:21,500 sats.satellite.EO-2 INFO <559.00> EO-2: setting timed terminal event at 1129.5
2026-02-25 01:07:21,501 sats.satellite.EO-3 INFO <559.00> EO-3: target index 9 tasked
2026-02-25 01:07:21,501 sats.satellite.EO-3 INFO <559.00> EO-3: Target(tgt-633) tasked for imaging
2026-02-25 01:07:21,502 sats.satellite.EO-3 INFO <559.00> EO-3: Target(tgt-633) window enabled: 1293.1 to 1500.1
2026-02-25 01:07:21,503 sats.satellite.EO-3 INFO <559.00> EO-3: setting timed terminal event at 1500.1
2026-02-25 01:07:21,560 sats.satellite.EO-1 INFO <835.00> EO-1: imaged Target(tgt-506)
2026-02-25 01:07:21,561 data.base INFO <835.00> Total reward: {'EO-1': 0.7124630470978522}
2026-02-25 01:07:21,562 sats.satellite.EO-1 INFO <835.00> EO-1: Satellite EO-1 requires retasking
2026-02-25 01:07:21,562 sats.satellite.EO-1 INFO <835.00> EO-1: Finding opportunity windows from 1200.00 to 1800.00 seconds
2026-02-25 01:07:21,602 sats.satellite.EO-2 INFO <835.00> EO-2: Finding opportunity windows from 1200.00 to 1800.00 seconds
2026-02-25 01:07:21,653 sats.satellite.EO-1 WARNING <835.00> EO-1: failed battery_valid check
2026-02-25 01:07:21,654 gym INFO <835.00> Step reward: -0.28753695290214776
2026-02-25 01:07:21,655 gym INFO <835.00> Episode terminated: True
2026-02-25 01:07:21,655 gym INFO <835.00> Episode truncated: False
PettingZoo API
The PettingZoo parallel API environment, ConstellationTasking, is largely the same as GeneralSatelliteTasking. See their documentation for a full description of the API. It tends to separate things into dictionaries keyed by agent, rather than tuples.
[11]:
from bsk_rl import ConstellationTasking
env = ConstellationTasking(
satellites=[
ImagingSatellite("EO-1", sat_args),
ImagingSatellite("EO-2", sat_args),
ImagingSatellite("EO-3", sat_args),
],
scenario=scene.UniformTargets(1000),
rewarder=data.UniqueImageReward(),
communicator=comm.LOSCommunication(), # Note that dyn must inherit from LOSCommunication
sat_arg_randomizer=sat_arg_randomizer,
log_level="INFO",
)
env.reset()
env.observation_spaces
2026-02-25 01:07:21,663 WARNING Creating logger for new env on PID=3928. Old environments in process may now log times incorrectly.
2026-02-25 01:07:21,665 gym INFO Resetting environment with seed=738028542
2026-02-25 01:07:21,667 scene.targets INFO Generating 1000 targets
2026-02-25 01:07:21,790 sats.satellite.EO-1 INFO <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2026-02-25 01:07:21,840 sats.satellite.EO-2 INFO <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2026-02-25 01:07:21,884 sats.satellite.EO-3 INFO <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2026-02-25 01:07:21,924 gym INFO <0.00> Environment reset
[11]:
{'EO-1': Box(-1e+16, 1e+16, (20,), float64),
'EO-2': Box(-1e+16, 1e+16, (20,), float64),
'EO-3': Box(-1e+16, 1e+16, (20,), float64)}
[12]:
env.action_spaces
[12]:
{'EO-1': Discrete(10), 'EO-2': Discrete(10), 'EO-3': Discrete(10)}
Actions are passed as a dictionary; the agent names can be accessed through the agents property.
[13]:
observation, reward, terminated, truncated, info = env.step(
{
env.agents[0]: 7,
env.agents[1]: 9,
env.agents[2]: 8,
}
)
2026-02-25 01:07:21,936 gym INFO <0.00> === STARTING STEP ===
2026-02-25 01:07:21,937 sats.satellite.EO-1 INFO <0.00> EO-1: target index 7 tasked
2026-02-25 01:07:21,938 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-153) tasked for imaging
2026-02-25 01:07:21,939 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-153) window enabled: 202.5 to 410.8
2026-02-25 01:07:21,939 sats.satellite.EO-1 INFO <0.00> EO-1: setting timed terminal event at 410.8
2026-02-25 01:07:21,940 sats.satellite.EO-2 INFO <0.00> EO-2: target index 9 tasked
2026-02-25 01:07:21,941 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-639) tasked for imaging
2026-02-25 01:07:21,942 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-639) window enabled: 515.9 to 600.0
2026-02-25 01:07:21,942 sats.satellite.EO-2 INFO <0.00> EO-2: setting timed terminal event at 600.0
2026-02-25 01:07:21,943 sats.satellite.EO-3 INFO <0.00> EO-3: target index 8 tasked
2026-02-25 01:07:21,944 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-160) tasked for imaging
2026-02-25 01:07:21,944 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-160) window enabled: 410.2 to 589.8
2026-02-25 01:07:21,945 sats.satellite.EO-3 INFO <0.00> EO-3: setting timed terminal event at 589.8
2026-02-25 01:07:21,989 sats.satellite.EO-1 INFO <205.00> EO-1: imaged Target(tgt-153)
2026-02-25 01:07:21,990 data.base INFO <205.00> Total reward: {'EO-1': 0.639546381968878}
2026-02-25 01:07:21,990 sats.satellite.EO-1 INFO <205.00> EO-1: Satellite EO-1 requires retasking
2026-02-25 01:07:21,992 sats.satellite.EO-1 INFO <205.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2026-02-25 01:07:22,031 sats.satellite.EO-3 INFO <205.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2026-02-25 01:07:22,092 gym INFO <205.00> Step reward: {'EO-1': 0.639546381968878}
[14]:
observation
[14]:
{'EO-1': array([ 0.74852113, -0.00638632, 0.33600633, 0.01550497, 0.23388875,
0.01145628, 0.3357882 , 0.00441972, 0.32920879, 0.01596271,
0.23018306, 0.02581727, 0.19757871, 0.04846655, 0.0603808 ,
0.07559755, 0.39690142, 0.06939158, 0.3866219 , 0.11515096]),
'EO-2': array([ 0.90074654, 0.00121456, 0.92418392, -0.01755549, 0.82602159,
-0.01144103, 0.59232349, 0.01071019, 0.09860004, 0.01833728,
0.85925538, 0.03286212, 0.79394034, 0.05454878, 0.43287659,
0.04995058, 0.63840384, 0.05308071, 0.51068343, 0.05752068]),
'EO-3': array([ 0.21299728, -0.03358071, 0.00837615, -0.02875913, 0.39119406,
-0.02722272, 0.98990395, 0.02581346, 0.29028341, 0.03399302,
0.87046931, 0.03599444, 0.52231258, 0.04560772, 0.39905331,
0.0354892 , 0.12214175, 0.06262204, 0.21608037, 0.08732636])}
Other than compatibility with MARL algorithms, the main benefit of the PettingZoo API is that it allows for individual agents to fail without terminating the entire environment.
[15]:
# Immediately kill satellite 0
env.unwrapped.satellites[0].is_alive = isnt_alive
env.agents
[15]:
['EO-1', 'EO-2', 'EO-3']
[16]:
observation, reward, terminated, truncated, info = env.step({
env.agents[0]: 7,
env.agents[1]: 9,
}
)
2026-02-25 01:07:22,108 gym INFO <205.00> === STARTING STEP ===
2026-02-25 01:07:22,108 sats.satellite.EO-1 INFO <205.00> EO-1: target index 7 tasked
2026-02-25 01:07:22,109 sats.satellite.EO-1 INFO <205.00> EO-1: Target(tgt-164) tasked for imaging
2026-02-25 01:07:22,110 sats.satellite.EO-1 INFO <205.00> EO-1: Target(tgt-164) window enabled: 635.9 to 725.2
2026-02-25 01:07:22,111 sats.satellite.EO-1 INFO <205.00> EO-1: setting timed terminal event at 725.2
2026-02-25 01:07:22,112 sats.satellite.EO-2 INFO <205.00> EO-2: target index 9 tasked
2026-02-25 01:07:22,112 sats.satellite.EO-2 INFO <205.00> EO-2: Target(tgt-972) tasked for imaging
2026-02-25 01:07:22,113 sats.satellite.EO-2 INFO <205.00> EO-2: Target(tgt-972) window enabled: 532.9 to 600.0
2026-02-25 01:07:22,114 sats.satellite.EO-2 INFO <205.00> EO-2: setting timed terminal event at 600.0
2026-02-25 01:07:22,158 sats.satellite.EO-3 INFO <413.00> EO-3: imaged Target(tgt-160)
2026-02-25 01:07:22,159 data.base INFO <413.00> Total reward: {'EO-3': 0.8704693136890076}
2026-02-25 01:07:22,159 sats.satellite.EO-3 INFO <413.00> EO-3: Satellite EO-3 requires retasking
2026-02-25 01:07:22,161 sats.satellite.EO-2 INFO <413.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2026-02-25 01:07:22,213 gym INFO <413.00> Step reward: {'EO-1': -1.0, 'EO-3': 0.8704693136890076}
2026-02-25 01:07:22,214 gym INFO <413.00> Episode terminated: ['EO-1']