Multi-Agent Environments
Two multiagent environments are given in the package:
GeneralSatelliteTasking, a Gymnasium-based environment and the basis for all other environments.
ConstellationTasking, which implements the PettingZoo parallel API.
The latter is preferable for multi-agent RL (MARL) settings, as most algorithms are designed for this kind of API.
Configuring the Environment
For this example, a multisatellite target imaging environment will be used. The goal is to maximize the value of unique images taken.
As usual, the satellite type is defined first.
[1]:
from bsk_rl import sats, act, obs, scene, data, comm
from bsk_rl.sim import dyn, fsw
class ImagingSatellite(sats.ImagingSatellite):
observation_spec = [
obs.OpportunityProperties(
dict(prop="priority"),
dict(prop="opportunity_open", norm=5700.0),
n_ahead_observe=10,
)
]
action_spec = [act.Image(n_ahead_image=10)]
dyn_type = dyn.FullFeaturedDynModel
fsw_type = fsw.SteeringImagerFSWModel
Satellite properties are set to give the satellite near-unlimited power and storage resources. To randomize some parameters in a correlated manner across satellites, a sat_arg_randomizer is set and passed to the environment. In this case, the satellites are distributed in a trivial single-plane Walker-delta constellation.
[2]:
from bsk_rl.utils.orbital import walker_delta_args
sat_args = dict(
imageAttErrorRequirement=0.01,
imageRateErrorRequirement=0.01,
batteryStorageCapacity=1e9,
storedCharge_Init=1e9,
dataStorageCapacity=1e12,
u_max=0.4,
K1=0.25,
K3=3.0,
omega_max=0.087,
servo_Ki=5.0,
servo_P=150 / 5,
)
sat_arg_randomizer = walker_delta_args(altitude=800.0, inc=60.0, n_planes=1)
Gym API
GeneralSatelliteTasking uses tuples of actions and observations to interact with the environment.
[3]:
from bsk_rl import GeneralSatelliteTasking
env = GeneralSatelliteTasking(
satellites=[
ImagingSatellite("EO-1", sat_args),
ImagingSatellite("EO-2", sat_args),
ImagingSatellite("EO-3", sat_args),
],
scenario=scene.UniformTargets(1000),
rewarder=data.UniqueImageReward(),
communicator=comm.LOSCommunication(), # Note that dyn must inherit from LOSCommunication
sat_arg_randomizer=sat_arg_randomizer,
log_level="INFO",
)
env.reset()
env.observation_space
2026-05-19 20:29:30,227 gym INFO Resetting environment with seed=986115960
2026-05-19 20:29:30,230 scene.targets INFO Generating 1000 targets
2026-05-19 20:29:30,298 sats.satellite.EO-1 INFO <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2026-05-19 20:29:30,332 sats.satellite.EO-2 INFO <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2026-05-19 20:29:30,367 sats.satellite.EO-3 INFO <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2026-05-19 20:29:30,403 gym INFO <0.00> Environment reset
[3]:
Tuple(Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64))
[4]:
env.action_space
[4]:
Tuple(Discrete(10), Discrete(10), Discrete(10))
Consequently, actions are passed as a tuple. The step will stop the first time any satellite completes an action.
[5]:
observation, reward, terminated, truncated, info = env.step([7, 9, 8])
2026-05-19 20:29:30,415 gym INFO <0.00> === STARTING STEP ===
2026-05-19 20:29:30,416 sats.satellite.EO-1 INFO <0.00> EO-1: target index 7 tasked
2026-05-19 20:29:30,416 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-259) tasked for imaging
2026-05-19 20:29:30,417 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-259) window enabled: 193.8 to 400.5
2026-05-19 20:29:30,418 sats.satellite.EO-1 INFO <0.00> EO-1: setting timed terminal event at 400.5
2026-05-19 20:29:30,419 sats.satellite.EO-2 INFO <0.00> EO-2: target index 9 tasked
2026-05-19 20:29:30,419 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-630) tasked for imaging
2026-05-19 20:29:30,420 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-630) window enabled: 274.1 to 432.4
2026-05-19 20:29:30,421 sats.satellite.EO-2 INFO <0.00> EO-2: setting timed terminal event at 432.4
2026-05-19 20:29:30,421 sats.satellite.EO-3 INFO <0.00> EO-3: target index 8 tasked
2026-05-19 20:29:30,422 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-541) tasked for imaging
2026-05-19 20:29:30,423 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-541) window enabled: 400.5 to 581.6
2026-05-19 20:29:30,423 sats.satellite.EO-3 INFO <0.00> EO-3: setting timed terminal event at 581.6
2026-05-19 20:29:30,474 sats.satellite.EO-1 INFO <196.00> EO-1: imaged Target(tgt-259)
2026-05-19 20:29:30,475 data.base INFO <196.00> Total reward: {'EO-1': 0.3953655843250563}
2026-05-19 20:29:30,475 sats.satellite.EO-1 INFO <196.00> EO-1: Satellite EO-1 requires retasking
2026-05-19 20:29:30,476 sats.satellite.EO-1 INFO <196.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2026-05-19 20:29:30,516 gym INFO <196.00> Step reward: 0.3953655843250563
[6]:
observation
[6]:
(array([ 0.72534076, -0.02489984, 0.26884405, -0.01628331, 0.86900513,
-0.01020251, 0.66039315, 0.0302466 , 0.64537845, 0.06924374,
0.73550828, 0.06478752, 0.20670747, 0.0790755 , 0.78172605,
0.07706475, 0.87444427, 0.09055819, 0.34683973, 0.08514924]),
array([ 0.32339267, -0.01779771, 0.63288115, -0.00684584, 0.40888213,
-0.00439092, 0.74739081, 0.01610123, 0.4285398 , 0.01369686,
0.49678816, 0.00730153, 0.2863464 , 0.0101031 , 0.17725107,
0.02201616, 0.65787583, 0.04882878, 0.71440445, 0.05597138]),
array([ 0.16463002, -0.00519807, 0.11733033, 0.00695053, 0.08092027,
0.01097047, 0.82478401, 0.02482425, 0.708638 , 0.01243765,
0.23996745, 0.01339228, 0.37748072, 0.03587805, 0.16080077,
0.06303182, 0.61538618, 0.04830295, 0.42336447, 0.05430929]))
At this point, either every satellite can be retasked, or satellites can continue their previous action by passing None as the action. To see which satellites must be retasked (i.e. their previous action is done and they have nothing more to do), look at "requires_retasking" in each satellite’s info.
[7]:
info
[7]:
{'EO-1': {'requires_retasking': True},
'EO-2': {'requires_retasking': False},
'EO-3': {'requires_retasking': False},
'd_ts': 196.0}
Based on this list, we decide here to only retask the satellite that needs it.
[8]:
actions = [0 if info[sat.name]["requires_retasking"] else None for sat in env.unwrapped.satellites]
actions
[8]:
[0, None, None]
[9]:
observation, reward, terminated, truncated, info = env.step(actions)
2026-05-19 20:29:30,536 gym INFO <196.00> === STARTING STEP ===
2026-05-19 20:29:30,537 sats.satellite.EO-1 INFO <196.00> EO-1: target index 0 tasked
2026-05-19 20:29:30,537 sats.satellite.EO-1 INFO <196.00> EO-1: Target(tgt-44) tasked for imaging
2026-05-19 20:29:30,538 sats.satellite.EO-1 INFO <196.00> EO-1: Target(tgt-44) window enabled: 54.1 to 223.1
2026-05-19 20:29:30,538 sats.satellite.EO-1 INFO <196.00> EO-1: setting timed terminal event at 223.1
2026-05-19 20:29:30,547 sats.satellite.EO-1 INFO <224.00> EO-1: timed termination at 223.1 for Target(tgt-44) window
2026-05-19 20:29:30,548 data.base INFO <224.00> Total reward: {}
2026-05-19 20:29:30,548 sats.satellite.EO-1 INFO <224.00> EO-1: Satellite EO-1 requires retasking
2026-05-19 20:29:30,551 gym INFO <224.00> Step reward: 0.0
In this environment, the environment will stop if any agent dies. To demonstrate this, one satellite is forcibly killed.
[10]:
from Basilisk.architecture import messaging
def isnt_alive(log_failure=False):
"""Mock satellite 0 dying."""
self = env.unwrapped.satellites[0]
death_message = messaging.PowerStorageStatusMsgPayload()
death_message.storageLevel = 0.0
self.dynamics.powerMonitor.batPowerOutMsg.write(death_message)
return self.dynamics.is_alive(log_failure=log_failure) and self.fsw.is_alive(
log_failure=log_failure
)
env.unwrapped.satellites[0].is_alive = isnt_alive
observation, reward, terminated, truncated, info = env.step([6, 7, 9])
2026-05-19 20:29:30,556 gym INFO <224.00> === STARTING STEP ===
2026-05-19 20:29:30,557 sats.satellite.EO-1 INFO <224.00> EO-1: target index 6 tasked
2026-05-19 20:29:30,557 sats.satellite.EO-1 INFO <224.00> EO-1: Target(tgt-924) tasked for imaging
2026-05-19 20:29:30,558 sats.satellite.EO-1 INFO <224.00> EO-1: Target(tgt-924) window enabled: 635.3 to 762.4
2026-05-19 20:29:30,558 sats.satellite.EO-1 INFO <224.00> EO-1: setting timed terminal event at 762.4
2026-05-19 20:29:30,559 sats.satellite.EO-2 INFO <224.00> EO-2: target index 7 tasked
2026-05-19 20:29:30,559 sats.satellite.EO-2 INFO <224.00> EO-2: Target(tgt-79) tasked for imaging
2026-05-19 20:29:30,560 sats.satellite.EO-2 INFO <224.00> EO-2: Target(tgt-79) window enabled: 321.5 to 524.7
2026-05-19 20:29:30,560 sats.satellite.EO-2 INFO <224.00> EO-2: setting timed terminal event at 524.7
2026-05-19 20:29:30,561 sats.satellite.EO-3 INFO <224.00> EO-3: target index 9 tasked
2026-05-19 20:29:30,562 sats.satellite.EO-3 INFO <224.00> EO-3: Target(tgt-658) tasked for imaging
2026-05-19 20:29:30,562 sats.satellite.EO-3 INFO <224.00> EO-3: Target(tgt-658) window enabled: 505.6 to 600.0
2026-05-19 20:29:30,563 sats.satellite.EO-3 INFO <224.00> EO-3: setting timed terminal event at 600.0
2026-05-19 20:29:30,589 sats.satellite.EO-2 INFO <324.00> EO-2: imaged Target(tgt-79)
2026-05-19 20:29:30,590 data.base INFO <324.00> Total reward: {'EO-2': 0.17725106982234673}
2026-05-19 20:29:30,590 sats.satellite.EO-2 INFO <324.00> EO-2: Satellite EO-2 requires retasking
2026-05-19 20:29:30,591 sats.satellite.EO-2 INFO <324.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2026-05-19 20:29:30,643 sats.satellite.EO-1 WARNING <324.00> EO-1: failed battery_valid check
2026-05-19 20:29:30,644 gym INFO <324.00> Step reward: -0.8227489301776533
2026-05-19 20:29:30,644 gym INFO <324.00> Episode terminated: True
2026-05-19 20:29:30,645 gym INFO <324.00> Episode truncated: False
PettingZoo API
The PettingZoo parallel API environment, ConstellationTasking, is largely the same as GeneralSatelliteTasking. See their documentation for a full description of the API. It tends to separate things into dictionaries keyed by agent, rather than tuples.
[11]:
from bsk_rl import ConstellationTasking
env = ConstellationTasking(
satellites=[
ImagingSatellite("EO-1", sat_args),
ImagingSatellite("EO-2", sat_args),
ImagingSatellite("EO-3", sat_args),
],
scenario=scene.UniformTargets(1000),
rewarder=data.UniqueImageReward(),
communicator=comm.LOSCommunication(), # Note that dyn must inherit from LOSCommunication
sat_arg_randomizer=sat_arg_randomizer,
log_level="INFO",
)
env.reset()
env.observation_spaces
2026-05-19 20:29:30,652 WARNING Creating logger for new env on PID=4395. Old environments in process may now log times incorrectly.
2026-05-19 20:29:30,654 gym INFO Resetting environment with seed=1412607029
2026-05-19 20:29:30,656 scene.targets INFO Generating 1000 targets
2026-05-19 20:29:30,689 sats.satellite.EO-1 INFO <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2026-05-19 20:29:30,726 sats.satellite.EO-2 INFO <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2026-05-19 20:29:30,762 sats.satellite.EO-3 INFO <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2026-05-19 20:29:30,803 gym INFO <0.00> Environment reset
[11]:
{'EO-1': Box(-1e+16, 1e+16, (20,), float64),
'EO-2': Box(-1e+16, 1e+16, (20,), float64),
'EO-3': Box(-1e+16, 1e+16, (20,), float64)}
[12]:
env.action_spaces
[12]:
{'EO-1': Discrete(10), 'EO-2': Discrete(10), 'EO-3': Discrete(10)}
Actions are passed as a dictionary; the agent names can be accessed through the agents property.
[13]:
observation, reward, terminated, truncated, info = env.step(
{
env.agents[0]: 7,
env.agents[1]: 9,
env.agents[2]: 8,
}
)
2026-05-19 20:29:30,814 gym INFO <0.00> === STARTING STEP ===
2026-05-19 20:29:30,814 sats.satellite.EO-1 INFO <0.00> EO-1: target index 7 tasked
2026-05-19 20:29:30,814 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-480) tasked for imaging
2026-05-19 20:29:30,815 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-480) window enabled: 90.2 to 236.7
2026-05-19 20:29:30,816 sats.satellite.EO-1 INFO <0.00> EO-1: setting timed terminal event at 236.7
2026-05-19 20:29:30,816 sats.satellite.EO-2 INFO <0.00> EO-2: target index 9 tasked
2026-05-19 20:29:30,817 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-676) tasked for imaging
2026-05-19 20:29:30,817 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-676) window enabled: 299.9 to 461.9
2026-05-19 20:29:30,818 sats.satellite.EO-2 INFO <0.00> EO-2: setting timed terminal event at 461.9
2026-05-19 20:29:30,819 sats.satellite.EO-3 INFO <0.00> EO-3: target index 8 tasked
2026-05-19 20:29:30,819 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-195) tasked for imaging
2026-05-19 20:29:30,820 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-195) window enabled: 131.7 to 332.7
2026-05-19 20:29:30,820 sats.satellite.EO-3 INFO <0.00> EO-3: setting timed terminal event at 332.7
2026-05-19 20:29:30,840 sats.satellite.EO-1 INFO <93.00> EO-1: imaged Target(tgt-480)
2026-05-19 20:29:30,841 data.base INFO <93.00> Total reward: {'EO-1': 0.7060713096245507}
2026-05-19 20:29:30,841 sats.satellite.EO-1 INFO <93.00> EO-1: Satellite EO-1 requires retasking
2026-05-19 20:29:30,845 gym INFO <93.00> Step reward: {'EO-1': 0.7060713096245507}
[14]:
observation
[14]:
{'EO-1': array([ 0.89440326, -0.01631579, 0.27299245, -0.01077397, 0.66186383,
-0.01631579, 0.84700055, -0.01631579, 0.05380567, -0.01631579,
0.1727596 , 0.00579201, 0.38227638, 0.02283473, 0.74101799,
0.0194658 , 0.0456438 , 0.05247228, 0.71245699, 0.08077344]),
'EO-2': array([ 0.99443799, -0.00802161, 0.09877264, 0.00389612, 0.04392426,
0.01743539, 0.26346661, 0.02031343, 0.56174605, 0.05154918,
0.50189443, 0.0282038 , 0.4225443 , 0.03837597, 0.31099963,
0.03630247, 0.6447967 , 0.02892887, 0.2905543 , 0.06153436]),
'EO-3': array([ 0.69747148, -0.01143501, 0.72105094, -0.01631579, 0.96783275,
0.00309007, 0.10655504, -0.01631579, 0.189789 , -0.00632669,
0.38362972, 0.00280303, 0.52188217, 0.00678178, 0.03899653,
0.00812659, 0.58447659, 0.01703968, 0.88086894, 0.01099811])}
Other than compatibility with MARL algorithms, the main benefit of the PettingZoo API is that it allows for individual agents to fail without terminating the entire environment.
[15]:
# Immediately kill satellite 0
env.unwrapped.satellites[0].is_alive = isnt_alive
env.agents
[15]:
['EO-1', 'EO-2', 'EO-3']
[16]:
observation, reward, terminated, truncated, info = env.step({
env.agents[0]: 7,
env.agents[1]: 9,
}
)
2026-05-19 20:29:30,860 gym INFO <93.00> === STARTING STEP ===
2026-05-19 20:29:30,861 sats.satellite.EO-1 INFO <93.00> EO-1: target index 7 tasked
2026-05-19 20:29:30,861 sats.satellite.EO-1 INFO <93.00> EO-1: Target(tgt-909) tasked for imaging
2026-05-19 20:29:30,862 sats.satellite.EO-1 INFO <93.00> EO-1: Target(tgt-909) window enabled: 204.0 to 405.0
2026-05-19 20:29:30,862 sats.satellite.EO-1 INFO <93.00> EO-1: setting timed terminal event at 405.0
2026-05-19 20:29:30,863 sats.satellite.EO-2 INFO <93.00> EO-2: target index 9 tasked
2026-05-19 20:29:30,864 sats.satellite.EO-2 INFO <93.00> EO-2: Target(tgt-50) tasked for imaging
2026-05-19 20:29:30,864 sats.satellite.EO-2 INFO <93.00> EO-2: Target(tgt-50) window enabled: 443.7 to 600.0
2026-05-19 20:29:30,865 sats.satellite.EO-2 INFO <93.00> EO-2: setting timed terminal event at 600.0
2026-05-19 20:29:30,875 sats.satellite.EO-3 INFO <134.00> EO-3: imaged Target(tgt-195)
2026-05-19 20:29:30,875 data.base INFO <134.00> Total reward: {'EO-3': 0.5218821736595285}
2026-05-19 20:29:30,876 sats.satellite.EO-3 INFO <134.00> EO-3: Satellite EO-3 requires retasking
2026-05-19 20:29:30,878 sats.satellite.EO-1 INFO <134.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2026-05-19 20:29:30,934 gym INFO <134.00> Step reward: {'EO-1': -1.0, 'EO-3': 0.5218821736595285}
2026-05-19 20:29:30,934 gym INFO <134.00> Episode terminated: ['EO-1']