Multi-Agent Environments
Two multiagent environments are given in the package:
GeneralSatelliteTasking, a Gymnasium-based environment and the basis for all other environments.
ConstellationTasking, which implements the PettingZoo parallel API.
The latter is preferable for multi-agent RL (MARL) settings, as most algorithms are designed for this kind of API.
Configuring the Environment
For this example, a multisatellite target imaging environment will be used. The goal is to maximize the value of unique images taken.
As usual, the satellite type is defined first.
[1]:
from bsk_rl import sats, act, obs, scene, data, comm
from bsk_rl.sim import dyn, fsw
class ImagingSatellite(sats.ImagingSatellite):
observation_spec = [
obs.OpportunityProperties(
dict(prop="priority"),
dict(prop="opportunity_open", norm=5700.0),
n_ahead_observe=10,
)
]
action_spec = [act.Image(n_ahead_image=10)]
dyn_type = dyn.FullFeaturedDynModel
fsw_type = fsw.SteeringImagerFSWModel
Satellite properties are set to give the satellite near-unlimited power and storage resources. To randomize some parameters in a correlated manner across satellites, a sat_arg_randomizer is set and passed to the environment. In this case, the satellites are distributed in a trivial single-plane Walker-delta constellation.
[2]:
from bsk_rl.utils.orbital import walker_delta_args
sat_args = dict(
imageAttErrorRequirement=0.01,
imageRateErrorRequirement=0.01,
batteryStorageCapacity=1e9,
storedCharge_Init=1e9,
dataStorageCapacity=1e12,
u_max=0.4,
K1=0.25,
K3=3.0,
omega_max=0.087,
servo_Ki=5.0,
servo_P=150 / 5,
)
sat_arg_randomizer = walker_delta_args(altitude=800.0, inc=60.0, n_planes=1)
Gym API
GeneralSatelliteTasking uses tuples of actions and observations to interact with the environment.
[3]:
from bsk_rl import GeneralSatelliteTasking
env = GeneralSatelliteTasking(
satellites=[
ImagingSatellite("EO-1", sat_args),
ImagingSatellite("EO-2", sat_args),
ImagingSatellite("EO-3", sat_args),
],
scenario=scene.UniformTargets(1000),
rewarder=data.UniqueImageReward(),
communicator=comm.LOSCommunication(), # Note that dyn must inherit from LOSCommunication
sat_arg_randomizer=sat_arg_randomizer,
log_level="INFO",
)
env.reset()
env.observation_space
2025-11-05 22:48:38,849 gym INFO Resetting environment with seed=1547192042
2025-11-05 22:48:38,851 scene.targets INFO Generating 1000 targets
2025-11-05 22:48:39,018 sats.satellite.EO-1 INFO <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2025-11-05 22:48:39,057 sats.satellite.EO-1 INFO <0.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-11-05 22:48:39,094 sats.satellite.EO-2 INFO <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2025-11-05 22:48:39,134 sats.satellite.EO-3 INFO <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2025-11-05 22:48:39,170 gym INFO <0.00> Environment reset
[3]:
Tuple(Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64))
[4]:
env.action_space
[4]:
Tuple(Discrete(10), Discrete(10), Discrete(10))
Consequently, actions are passed as a tuple. The step will stop the first time any satellite completes an action.
[5]:
observation, reward, terminated, truncated, info = env.step([7, 9, 8])
2025-11-05 22:48:39,185 gym INFO <0.00> === STARTING STEP ===
2025-11-05 22:48:39,186 sats.satellite.EO-1 INFO <0.00> EO-1: target index 7 tasked
2025-11-05 22:48:39,186 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-519) tasked for imaging
2025-11-05 22:48:39,187 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-519) window enabled: 614.1 to 645.0
2025-11-05 22:48:39,188 sats.satellite.EO-1 INFO <0.00> EO-1: setting timed terminal event at 645.0
2025-11-05 22:48:39,189 sats.satellite.EO-2 INFO <0.00> EO-2: target index 9 tasked
2025-11-05 22:48:39,190 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-811) tasked for imaging
2025-11-05 22:48:39,190 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-811) window enabled: 173.7 to 374.7
2025-11-05 22:48:39,191 sats.satellite.EO-2 INFO <0.00> EO-2: setting timed terminal event at 374.7
2025-11-05 22:48:39,192 sats.satellite.EO-3 INFO <0.00> EO-3: target index 8 tasked
2025-11-05 22:48:39,192 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-111) tasked for imaging
2025-11-05 22:48:39,193 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-111) window enabled: 437.1 to 600.0
2025-11-05 22:48:39,194 sats.satellite.EO-3 INFO <0.00> EO-3: setting timed terminal event at 600.0
2025-11-05 22:48:39,235 sats.satellite.EO-2 INFO <176.00> EO-2: imaged Target(tgt-811)
2025-11-05 22:48:39,236 data.base INFO <176.00> Total reward: {'EO-2': 0.5079934949850267}
2025-11-05 22:48:39,237 sats.satellite.EO-2 INFO <176.00> EO-2: Satellite EO-2 requires retasking
2025-11-05 22:48:39,238 sats.satellite.EO-3 INFO <176.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-11-05 22:48:39,279 gym INFO <176.00> Step reward: 0.5079934949850267
[6]:
observation
[6]:
(array([ 0.71621771, -0.01943427, 0.63597323, -0.03087719, 0.56637792,
-0.01223017, 0.60716887, -0.01812702, 0.56614095, -0.01209571,
0.06557256, 0.02623297, 0.22177599, 0.07686464, 0.32943726,
0.08241237, 0.99494099, 0.09417084, 0.71819909, 0.13085989]),
array([ 0.02048901, -0.01775804, 0.06075115, -0.02341873, 0.49419605,
-0.01716148, 0.37942278, -0.01944149, 0.12317875, 0.00592013,
0.44738873, -0.00913504, 0.81575916, 0.02518977, 0.73497147,
0.03296025, 0.12923178, 0.01267443, 0.66896754, 0.03189218]),
array([ 0.18337927, -0.03087719, 0.65728279, -0.00812571, 0.45632543,
0.01313141, 0.63298028, 0.01593659, 0.41372112, 0.00767957,
0.81193785, 0.02570958, 0.28951229, 0.04579868, 0.87917222,
0.06156085, 0.57518884, 0.06539764, 0.25933596, 0.07536863]))
At this point, either every satellite can be retasked, or satellites can continue their previous action by passing None as the action. To see which satellites must be retasked (i.e. their previous action is done and they have nothing more to do), look at "requires_retasking" in each satellite’s info.
[7]:
info
[7]:
{'EO-1': {'requires_retasking': False},
'EO-2': {'requires_retasking': True},
'EO-3': {'requires_retasking': False},
'd_ts': 176.0}
Based on this list, we decide here to only retask the satellite that needs it.
[8]:
actions = [0 if info[sat.name]["requires_retasking"] else None for sat in env.unwrapped.satellites]
actions
[8]:
[None, 0, None]
[9]:
observation, reward, terminated, truncated, info = env.step(actions)
2025-11-05 22:48:39,301 gym INFO <176.00> === STARTING STEP ===
2025-11-05 22:48:39,302 sats.satellite.EO-2 INFO <176.00> EO-2: target index 0 tasked
2025-11-05 22:48:39,302 sats.satellite.EO-2 INFO <176.00> EO-2: Target(tgt-236) tasked for imaging
2025-11-05 22:48:39,303 sats.satellite.EO-2 INFO <176.00> EO-2: Target(tgt-236) window enabled: 74.8 to 233.4
2025-11-05 22:48:39,304 sats.satellite.EO-2 INFO <176.00> EO-2: setting timed terminal event at 233.4
2025-11-05 22:48:39,320 sats.satellite.EO-2 INFO <234.00> EO-2: timed termination at 233.4 for Target(tgt-236) window
2025-11-05 22:48:39,321 data.base INFO <234.00> Total reward: {}
2025-11-05 22:48:39,322 sats.satellite.EO-2 INFO <234.00> EO-2: Satellite EO-2 requires retasking
2025-11-05 22:48:39,325 gym INFO <234.00> Step reward: 0.0
In this environment, the environment will stop if any agent dies. To demonstrate this, one satellite is forcibly killed.
[10]:
from Basilisk.architecture import messaging
def isnt_alive(log_failure=False):
"""Mock satellite 0 dying."""
self = env.unwrapped.satellites[0]
death_message = messaging.PowerStorageStatusMsgPayload()
death_message.storageLevel = 0.0
self.dynamics.powerMonitor.batPowerOutMsg.write(death_message)
return self.dynamics.is_alive(log_failure=log_failure) and self.fsw.is_alive(
log_failure=log_failure
)
env.unwrapped.satellites[0].is_alive = isnt_alive
observation, reward, terminated, truncated, info = env.step([6, 7, 9])
2025-11-05 22:48:39,331 gym INFO <234.00> === STARTING STEP ===
2025-11-05 22:48:39,331 sats.satellite.EO-1 INFO <234.00> EO-1: target index 6 tasked
2025-11-05 22:48:39,332 sats.satellite.EO-1 INFO <234.00> EO-1: Target(tgt-84) tasked for imaging
2025-11-05 22:48:39,333 sats.satellite.EO-1 INFO <234.00> EO-1: Target(tgt-84) window enabled: 921.9 to 953.4
2025-11-05 22:48:39,334 sats.satellite.EO-1 INFO <234.00> EO-1: setting timed terminal event at 953.4
2025-11-05 22:48:39,335 sats.satellite.EO-2 INFO <234.00> EO-2: target index 7 tasked
2025-11-05 22:48:39,335 sats.satellite.EO-2 INFO <234.00> EO-2: Target(tgt-394) tasked for imaging
2025-11-05 22:48:39,336 sats.satellite.EO-2 INFO <234.00> EO-2: Target(tgt-394) window enabled: 248.2 to 453.9
2025-11-05 22:48:39,337 sats.satellite.EO-2 INFO <234.00> EO-2: setting timed terminal event at 453.9
2025-11-05 22:48:39,338 sats.satellite.EO-3 INFO <234.00> EO-3: target index 9 tasked
2025-11-05 22:48:39,338 sats.satellite.EO-3 INFO <234.00> EO-3: Target(tgt-835) tasked for imaging
2025-11-05 22:48:39,339 sats.satellite.EO-3 INFO <234.00> EO-3: Target(tgt-835) window enabled: 663.9 to 831.2
2025-11-05 22:48:39,339 sats.satellite.EO-3 INFO <234.00> EO-3: setting timed terminal event at 831.2
2025-11-05 22:48:39,355 sats.satellite.EO-2 INFO <298.00> EO-2: imaged Target(tgt-394)
2025-11-05 22:48:39,356 data.base INFO <298.00> Total reward: {'EO-2': 0.12923178411361247}
2025-11-05 22:48:39,357 sats.satellite.EO-2 INFO <298.00> EO-2: Satellite EO-2 requires retasking
2025-11-05 22:48:39,358 sats.satellite.EO-2 INFO <298.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-11-05 22:48:39,399 sats.satellite.EO-1 WARNING <298.00> EO-1: failed battery_valid check
2025-11-05 22:48:39,401 gym INFO <298.00> Step reward: -0.8707682158863875
2025-11-05 22:48:39,401 gym INFO <298.00> Episode terminated: True
2025-11-05 22:48:39,402 gym INFO <298.00> Episode truncated: False
PettingZoo API
The PettingZoo parallel API environment, ConstellationTasking, is largely the same as GeneralSatelliteTasking. See their documentation for a full description of the API. It tends to separate things into dictionaries keyed by agent, rather than tuples.
[11]:
from bsk_rl import ConstellationTasking
env = ConstellationTasking(
satellites=[
ImagingSatellite("EO-1", sat_args),
ImagingSatellite("EO-2", sat_args),
ImagingSatellite("EO-3", sat_args),
],
scenario=scene.UniformTargets(1000),
rewarder=data.UniqueImageReward(),
communicator=comm.LOSCommunication(), # Note that dyn must inherit from LOSCommunication
sat_arg_randomizer=sat_arg_randomizer,
log_level="INFO",
)
env.reset()
env.observation_spaces
2025-11-05 22:48:39,410 WARNING Creating logger for new env on PID=4827. Old environments in process may now log times incorrectly.
2025-11-05 22:48:39,534 gym INFO Resetting environment with seed=2232055917
2025-11-05 22:48:39,536 scene.targets INFO Generating 1000 targets
2025-11-05 22:48:39,691 sats.satellite.EO-1 INFO <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2025-11-05 22:48:39,728 sats.satellite.EO-1 INFO <0.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-11-05 22:48:39,768 sats.satellite.EO-2 INFO <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2025-11-05 22:48:39,802 sats.satellite.EO-2 INFO <0.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-11-05 22:48:39,837 sats.satellite.EO-3 INFO <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2025-11-05 22:48:39,873 sats.satellite.EO-3 INFO <0.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-11-05 22:48:39,908 gym INFO <0.00> Environment reset
[11]:
{'EO-1': Box(-1e+16, 1e+16, (20,), float64),
'EO-2': Box(-1e+16, 1e+16, (20,), float64),
'EO-3': Box(-1e+16, 1e+16, (20,), float64)}
[12]:
env.action_spaces
[12]:
{'EO-1': Discrete(10), 'EO-2': Discrete(10), 'EO-3': Discrete(10)}
Actions are passed as a dictionary; the agent names can be accessed through the agents property.
[13]:
observation, reward, terminated, truncated, info = env.step(
{
env.agents[0]: 7,
env.agents[1]: 9,
env.agents[2]: 8,
}
)
2025-11-05 22:48:39,921 gym INFO <0.00> === STARTING STEP ===
2025-11-05 22:48:39,922 sats.satellite.EO-1 INFO <0.00> EO-1: target index 7 tasked
2025-11-05 22:48:39,922 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-528) tasked for imaging
2025-11-05 22:48:39,923 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-528) window enabled: 679.7 to 815.0
2025-11-05 22:48:39,924 sats.satellite.EO-1 INFO <0.00> EO-1: setting timed terminal event at 815.0
2025-11-05 22:48:39,924 sats.satellite.EO-2 INFO <0.00> EO-2: target index 9 tasked
2025-11-05 22:48:39,925 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-166) tasked for imaging
2025-11-05 22:48:39,927 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-166) window enabled: 644.8 to 696.8
2025-11-05 22:48:39,927 sats.satellite.EO-2 INFO <0.00> EO-2: setting timed terminal event at 696.8
2025-11-05 22:48:39,928 sats.satellite.EO-3 INFO <0.00> EO-3: target index 8 tasked
2025-11-05 22:48:39,929 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-176) tasked for imaging
2025-11-05 22:48:39,929 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-176) window enabled: 1032.1 to 1200.0
2025-11-05 22:48:39,930 sats.satellite.EO-3 INFO <0.00> EO-3: setting timed terminal event at 1200.0
2025-11-05 22:48:40,064 sats.satellite.EO-2 INFO <647.00> EO-2: imaged Target(tgt-166)
2025-11-05 22:48:40,065 data.base INFO <647.00> Total reward: {'EO-2': 0.9210245875955466}
2025-11-05 22:48:40,066 sats.satellite.EO-2 INFO <647.00> EO-2: Satellite EO-2 requires retasking
2025-11-05 22:48:40,068 sats.satellite.EO-2 INFO <647.00> EO-2: Finding opportunity windows from 1200.00 to 1800.00 seconds
2025-11-05 22:48:40,106 sats.satellite.EO-3 INFO <647.00> EO-3: Finding opportunity windows from 1200.00 to 1800.00 seconds
2025-11-05 22:48:40,166 gym INFO <647.00> Step reward: {'EO-2': 0.9210245875955466}
[14]:
observation
[14]:
{'EO-1': array([ 0.09859741, -0.03551423, 0.49198952, 0.00573547, 0.19679985,
0.01870813, 0.45048406, 0.00248848, 0.55164225, 0.01762653,
0.02325649, 0.01388872, 0.35569037, 0.03332458, 0.02772981,
0.03467764, 0.67272115, 0.04605867, 0.2139736 , 0.07489779]),
'EO-2': array([ 0.47492646, -0.03580397, 0.30780121, -0.03233286, 0.84733851,
0.00786825, 0.98881349, 0.01349388, 0.84796453, 0.05064862,
0.57450089, 0.03511686, 0.9865236 , 0.0674651 , 0.80830606,
0.09871199, 0.69594179, 0.10978892, 0.07363774, 0.14149279]),
'EO-3': array([ 0.11533782, -0.00608844, 0.85873453, 0.01749988, 0.48530866,
0.05399338, 0.02190762, 0.06756675, 0.95281325, 0.0802063 ,
0.67938977, 0.08240643, 0.98664312, 0.08783784, 0.42689825,
0.12397855, 0.42931792, 0.11831221, 0.96460181, 0.14865361])}
Other than compatibility with MARL algorithms, the main benefit of the PettingZoo API is that it allows for individual agents to fail without terminating the entire environment.
[15]:
# Immediately kill satellite 0
env.unwrapped.satellites[0].is_alive = isnt_alive
env.agents
[15]:
['EO-1', 'EO-2', 'EO-3']
[16]:
observation, reward, terminated, truncated, info = env.step({
env.agents[0]: 7,
env.agents[1]: 9,
}
)
2025-11-05 22:48:40,182 gym INFO <647.00> === STARTING STEP ===
2025-11-05 22:48:40,183 sats.satellite.EO-1 INFO <647.00> EO-1: target index 7 tasked
2025-11-05 22:48:40,184 sats.satellite.EO-1 INFO <647.00> EO-1: Target(tgt-328) tasked for imaging
2025-11-05 22:48:40,185 sats.satellite.EO-1 INFO <647.00> EO-1: Target(tgt-328) window enabled: 844.7 to 1037.7
2025-11-05 22:48:40,185 sats.satellite.EO-1 INFO <647.00> EO-1: setting timed terminal event at 1037.7
2025-11-05 22:48:40,186 sats.satellite.EO-2 INFO <647.00> EO-2: target index 9 tasked
2025-11-05 22:48:40,187 sats.satellite.EO-2 INFO <647.00> EO-2: Target(tgt-704) tasked for imaging
2025-11-05 22:48:40,188 sats.satellite.EO-2 INFO <647.00> EO-2: Target(tgt-704) window enabled: 1453.5 to 1573.9
2025-11-05 22:48:40,188 sats.satellite.EO-2 INFO <647.00> EO-2: setting timed terminal event at 1573.9
2025-11-05 22:48:40,230 sats.satellite.EO-1 INFO <847.00> EO-1: imaged Target(tgt-328)
2025-11-05 22:48:40,232 data.base INFO <847.00> Total reward: {'EO-1': 0.027729808348439078}
2025-11-05 22:48:40,232 sats.satellite.EO-1 INFO <847.00> EO-1: Satellite EO-1 requires retasking
2025-11-05 22:48:40,234 sats.satellite.EO-1 INFO <847.00> EO-1: Finding opportunity windows from 1200.00 to 1800.00 seconds
2025-11-05 22:48:40,281 sats.satellite.EO-2 INFO <847.00> EO-2: Finding opportunity windows from 1800.00 to 2400.00 seconds
2025-11-05 22:48:40,318 gym INFO <847.00> Step reward: {'EO-1': -0.9722701916515609}
2025-11-05 22:48:40,319 gym INFO <847.00> Episode terminated: ['EO-1']