Multi-Agent Environments
Two multiagent environments are given in the package:
GeneralSatelliteTasking, a Gymnasium-based environment and the basis for all other environments.
ConstellationTasking, which implements the PettingZoo parallel API.
The latter is preferable for multi-agent RL (MARL) settings, as most algorithms are designed for this kind of API.
Configuring the Environment
For this example, a multisatellite target imaging environment will be used. The goal is to maximize the value of unique images taken.
As usual, the satellite type is defined first.
[1]:
from bsk_rl import sats, act, obs, scene, data, comm
from bsk_rl.sim import dyn, fsw
class ImagingSatellite(sats.ImagingSatellite):
observation_spec = [
obs.OpportunityProperties(
dict(prop="priority"),
dict(prop="opportunity_open", norm=5700.0),
n_ahead_observe=10,
)
]
action_spec = [act.Image(n_ahead_image=10)]
dyn_type = dyn.FullFeaturedDynModel
fsw_type = fsw.SteeringImagerFSWModel
Satellite properties are set to give the satellite near-unlimited power and storage resources. To randomize some parameters in a correlated manner across satellites, a sat_arg_randomizer
is set and passed to the environment. In this case, the satellites are distributed in a trivial single-plane Walker-delta constellation.
[2]:
from bsk_rl.utils.orbital import walker_delta_args
sat_args = dict(
imageAttErrorRequirement=0.01,
imageRateErrorRequirement=0.01,
batteryStorageCapacity=1e9,
storedCharge_Init=1e9,
dataStorageCapacity=1e12,
u_max=0.4,
K1=0.25,
K3=3.0,
omega_max=0.087,
servo_Ki=5.0,
servo_P=150 / 5,
)
sat_arg_randomizer = walker_delta_args(altitude=800.0, inc=60.0, n_planes=1)
Gym API
GeneralSatelliteTasking uses tuples of actions and observations to interact with the environment.
[3]:
from bsk_rl import GeneralSatelliteTasking
env = GeneralSatelliteTasking(
satellites=[
ImagingSatellite("EO-1", sat_args),
ImagingSatellite("EO-2", sat_args),
ImagingSatellite("EO-3", sat_args),
],
scenario=scene.UniformTargets(1000),
rewarder=data.UniqueImageReward(),
communicator=comm.LOSCommunication(), # Note that dyn must inherit from LOSCommunication
sat_arg_randomizer=sat_arg_randomizer,
log_level="INFO",
)
env.reset()
env.observation_space
2025-06-20 19:56:32,548 gym INFO Resetting environment with seed=3223913771
2025-06-20 19:56:32,551 scene.targets INFO Generating 1000 targets
2025-06-20 19:56:32,695 sats.satellite.EO-1 INFO <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2025-06-20 19:56:32,730 sats.satellite.EO-1 INFO <0.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-06-20 19:56:32,775 sats.satellite.EO-2 INFO <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2025-06-20 19:56:32,808 sats.satellite.EO-2 INFO <0.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-06-20 19:56:32,849 sats.satellite.EO-3 INFO <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2025-06-20 19:56:32,884 gym INFO <0.00> Environment reset
[3]:
Tuple(Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64))
[4]:
env.action_space
[4]:
Tuple(Discrete(10), Discrete(10), Discrete(10))
Consequently, actions are passed as a tuple. The step will stop the first time any satellite completes an action.
[5]:
observation, reward, terminated, truncated, info = env.step([7, 9, 8])
2025-06-20 19:56:32,898 gym INFO <0.00> === STARTING STEP ===
2025-06-20 19:56:32,898 sats.satellite.EO-1 INFO <0.00> EO-1: target index 7 tasked
2025-06-20 19:56:32,899 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-669) tasked for imaging
2025-06-20 19:56:32,901 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-669) window enabled: 599.6 to 788.6
2025-06-20 19:56:32,901 sats.satellite.EO-1 INFO <0.00> EO-1: setting timed terminal event at 788.6
2025-06-20 19:56:32,903 sats.satellite.EO-2 INFO <0.00> EO-2: target index 9 tasked
2025-06-20 19:56:32,903 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-71) tasked for imaging
2025-06-20 19:56:32,904 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-71) window enabled: 686.4 to 895.3
2025-06-20 19:56:32,905 sats.satellite.EO-2 INFO <0.00> EO-2: setting timed terminal event at 895.3
2025-06-20 19:56:32,906 sats.satellite.EO-3 INFO <0.00> EO-3: target index 8 tasked
2025-06-20 19:56:32,906 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-661) tasked for imaging
2025-06-20 19:56:32,908 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-661) window enabled: 501.6 to 600.0
2025-06-20 19:56:32,909 sats.satellite.EO-3 INFO <0.00> EO-3: setting timed terminal event at 600.0
2025-06-20 19:56:33,219 sats.satellite.EO-3 INFO <504.00> EO-3: imaged Target(tgt-661)
2025-06-20 19:56:33,222 data.base INFO <504.00> Total reward: {'EO-3': 0.9945356680616969}
2025-06-20 19:56:33,228 sats.satellite.EO-3 INFO <504.00> EO-3: Satellite EO-3 requires retasking
2025-06-20 19:56:33,229 sats.satellite.EO-3 INFO <504.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-06-20 19:56:33,272 gym INFO <504.00> Step reward: 0.9945356680616969
[6]:
observation
[6]:
(array([ 0.52222295, -0.01585541, 0.27618759, 0.01677876, 0.26476257,
0.007232 , 0.12699221, 0.03167082, 0.50798445, 0.02818074,
0.2807055 , 0.02299049, 0.55789811, 0.04161632, 0.64037085,
0.05884404, 0.55224332, 0.06296221, 0.2945313 , 0.06580253]),
array([ 4.08919388e-01, -6.49386289e-05, 8.85082305e-01, 2.46810018e-03,
6.23219622e-01, 4.07608835e-02, 7.30684180e-02, 2.36434926e-02,
1.28678833e-01, 3.19995123e-02, 2.43969684e-01, 5.01919075e-02,
7.97015449e-01, 6.27320447e-02, 9.26415189e-01, 8.96370902e-02,
8.18879775e-01, 5.74237149e-02, 5.64014081e-01, 7.24127796e-02]),
array([ 8.41043162e-01, -2.29649068e-02, 8.55071476e-01, -8.44495004e-03,
1.34102048e-01, -1.55161337e-04, 5.70196134e-01, 3.88384721e-04,
6.53159665e-01, 4.41917505e-02, 9.83475334e-01, 2.89384165e-02,
7.99323597e-01, 2.80787028e-02, 1.19191283e-02, 4.87616798e-02,
4.60385547e-01, 4.37518134e-02, 5.13797007e-01, 6.38856108e-02]))
At this point, either every satellite can be retasked, or satellites can continue their previous action by passing None
as the action. To see which satellites must be retasked (i.e. their previous action is done and they have nothing more to do), look at "requires_retasking"
in each satellite’s info.
[7]:
info
[7]:
{'EO-1': {'requires_retasking': False},
'EO-2': {'requires_retasking': False},
'EO-3': {'requires_retasking': True},
'd_ts': 504.00000000000006}
Based on this list, we decide here to only retask the satellite that needs it.
[8]:
actions = [0 if info[sat.name]["requires_retasking"] else None for sat in env.unwrapped.satellites]
actions
[8]:
[None, None, 0]
[9]:
observation, reward, terminated, truncated, info = env.step(actions)
2025-06-20 19:56:33,294 gym INFO <504.00> === STARTING STEP ===
2025-06-20 19:56:33,295 sats.satellite.EO-3 INFO <504.00> EO-3: target index 0 tasked
2025-06-20 19:56:33,296 sats.satellite.EO-3 INFO <504.00> EO-3: Target(tgt-751) tasked for imaging
2025-06-20 19:56:33,297 sats.satellite.EO-3 INFO <504.00> EO-3: Target(tgt-751) window enabled: 373.1 to 556.3
2025-06-20 19:56:33,298 sats.satellite.EO-3 INFO <504.00> EO-3: setting timed terminal event at 556.3
2025-06-20 19:56:33,330 sats.satellite.EO-3 INFO <557.00> EO-3: timed termination at 556.3 for Target(tgt-751) window
2025-06-20 19:56:33,333 data.base INFO <557.00> Total reward: {}
2025-06-20 19:56:33,334 sats.satellite.EO-3 INFO <557.00> EO-3: Satellite EO-3 requires retasking
2025-06-20 19:56:33,337 gym INFO <557.00> Step reward: 0.0
In this environment, the environment will stop if any agent dies. To demonstrate this, one satellite is forcibly killed.
[10]:
from Basilisk.architecture import messaging
def isnt_alive(log_failure=False):
"""Mock satellite 0 dying."""
self = env.unwrapped.satellites[0]
death_message = messaging.PowerStorageStatusMsgPayload()
death_message.storageLevel = 0.0
self.dynamics.powerMonitor.batPowerOutMsg.write(death_message)
return self.dynamics.is_alive(log_failure=log_failure) and self.fsw.is_alive(
log_failure=log_failure
)
env.unwrapped.satellites[0].is_alive = isnt_alive
observation, reward, terminated, truncated, info = env.step([6, 7, 9])
2025-06-20 19:56:33,344 gym INFO <557.00> === STARTING STEP ===
2025-06-20 19:56:33,345 sats.satellite.EO-1 INFO <557.00> EO-1: target index 6 tasked
2025-06-20 19:56:33,345 sats.satellite.EO-1 INFO <557.00> EO-1: Target(tgt-175) tasked for imaging
2025-06-20 19:56:33,347 sats.satellite.EO-1 INFO <557.00> EO-1: Target(tgt-175) window enabled: 741.2 to 898.2
2025-06-20 19:56:33,347 sats.satellite.EO-1 INFO <557.00> EO-1: setting timed terminal event at 898.2
2025-06-20 19:56:33,348 sats.satellite.EO-2 INFO <557.00> EO-2: target index 7 tasked
2025-06-20 19:56:33,349 sats.satellite.EO-2 INFO <557.00> EO-2: Target(tgt-218) tasked for imaging
2025-06-20 19:56:33,350 sats.satellite.EO-2 INFO <557.00> EO-2: Target(tgt-218) window enabled: 1014.9 to 1033.6
2025-06-20 19:56:33,351 sats.satellite.EO-2 INFO <557.00> EO-2: setting timed terminal event at 1033.6
2025-06-20 19:56:33,352 sats.satellite.EO-3 INFO <557.00> EO-3: target index 9 tasked
2025-06-20 19:56:33,352 sats.satellite.EO-3 INFO <557.00> EO-3: Target(tgt-336) tasked for imaging
2025-06-20 19:56:33,354 sats.satellite.EO-3 INFO <557.00> EO-3: Target(tgt-336) window enabled: 835.6 to 986.4
2025-06-20 19:56:33,354 sats.satellite.EO-3 INFO <557.00> EO-3: setting timed terminal event at 986.4
2025-06-20 19:56:33,467 sats.satellite.EO-1 INFO <744.00> EO-1: imaged Target(tgt-175)
2025-06-20 19:56:33,470 data.base INFO <744.00> Total reward: {'EO-1': 0.5578981087199643}
2025-06-20 19:56:33,473 sats.satellite.EO-1 INFO <744.00> EO-1: Satellite EO-1 requires retasking
2025-06-20 19:56:33,475 sats.satellite.EO-1 WARNING <744.00> EO-1: failed battery_valid check
2025-06-20 19:56:33,476 gym INFO <744.00> Step reward: -0.44210189128003574
2025-06-20 19:56:33,477 gym INFO <744.00> Episode terminated: True
2025-06-20 19:56:33,477 gym INFO <744.00> Episode truncated: False
PettingZoo API
The PettingZoo parallel API environment, ConstellationTasking, is largely the same as GeneralSatelliteTasking. See their documentation for a full description of the API. It tends to separate things into dictionaries keyed by agent, rather than tuples.
[11]:
from bsk_rl import ConstellationTasking
env = ConstellationTasking(
satellites=[
ImagingSatellite("EO-1", sat_args),
ImagingSatellite("EO-2", sat_args),
ImagingSatellite("EO-3", sat_args),
],
scenario=scene.UniformTargets(1000),
rewarder=data.UniqueImageReward(),
communicator=comm.LOSCommunication(), # Note that dyn must inherit from LOSCommunication
sat_arg_randomizer=sat_arg_randomizer,
log_level="INFO",
)
env.reset()
env.observation_spaces
2025-06-20 19:56:33,486 WARNING Creating logger for new env on PID=4615. Old environments in process may now log times incorrectly.
2025-06-20 19:56:33,489 gym INFO Resetting environment with seed=4164575004
2025-06-20 19:56:33,491 scene.targets INFO Generating 1000 targets
2025-06-20 19:56:33,718 sats.satellite.EO-1 INFO <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2025-06-20 19:56:33,755 sats.satellite.EO-2 INFO <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2025-06-20 19:56:33,795 sats.satellite.EO-3 INFO <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2025-06-20 19:56:33,831 gym INFO <0.00> Environment reset
[11]:
{'EO-1': Box(-1e+16, 1e+16, (20,), float64),
'EO-2': Box(-1e+16, 1e+16, (20,), float64),
'EO-3': Box(-1e+16, 1e+16, (20,), float64)}
[12]:
env.action_spaces
[12]:
{'EO-1': Discrete(10), 'EO-2': Discrete(10), 'EO-3': Discrete(10)}
Actions are passed as a dictionary; the agent names can be accessed through the agents
property.
[13]:
observation, reward, terminated, truncated, info = env.step(
{
env.agents[0]: 7,
env.agents[1]: 9,
env.agents[2]: 8,
}
)
2025-06-20 19:56:33,843 gym INFO <0.00> === STARTING STEP ===
2025-06-20 19:56:33,844 sats.satellite.EO-1 INFO <0.00> EO-1: target index 7 tasked
2025-06-20 19:56:33,845 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-991) tasked for imaging
2025-06-20 19:56:33,846 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-991) window enabled: 353.7 to 528.1
2025-06-20 19:56:33,847 sats.satellite.EO-1 INFO <0.00> EO-1: setting timed terminal event at 528.1
2025-06-20 19:56:33,848 sats.satellite.EO-2 INFO <0.00> EO-2: target index 9 tasked
2025-06-20 19:56:33,849 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-143) tasked for imaging
2025-06-20 19:56:33,850 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-143) window enabled: 426.2 to 570.8
2025-06-20 19:56:33,850 sats.satellite.EO-2 INFO <0.00> EO-2: setting timed terminal event at 570.8
2025-06-20 19:56:33,851 sats.satellite.EO-3 INFO <0.00> EO-3: target index 8 tasked
2025-06-20 19:56:33,852 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-294) tasked for imaging
2025-06-20 19:56:33,854 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-294) window enabled: 456.5 to 600.0
2025-06-20 19:56:33,854 sats.satellite.EO-3 INFO <0.00> EO-3: setting timed terminal event at 600.0
2025-06-20 19:56:34,063 sats.satellite.EO-1 INFO <356.00> EO-1: imaged Target(tgt-991)
2025-06-20 19:56:34,066 data.base INFO <356.00> Total reward: {'EO-1': 0.08722512054173304}
2025-06-20 19:56:34,071 sats.satellite.EO-1 INFO <356.00> EO-1: Satellite EO-1 requires retasking
2025-06-20 19:56:34,072 sats.satellite.EO-1 INFO <356.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-06-20 19:56:34,116 sats.satellite.EO-2 INFO <356.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-06-20 19:56:34,152 sats.satellite.EO-3 INFO <356.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-06-20 19:56:34,186 gym INFO <356.00> Step reward: {'EO-1': 0.08722512054173304}
[14]:
observation
[14]:
{'EO-1': array([ 0.09633932, -0.01639699, 0.34747255, -0.01273877, 0.75784324,
0.01320491, 0.3357992 , 0.02432963, 0.4250096 , 0.02171044,
0.32946831, 0.05999801, 0.18078222, 0.07550929, 0.4019493 ,
0.11241186, 0.10593855, 0.10149023, 0.40956734, 0.10824072]),
'EO-2': array([0.131004 , 0.0123173 , 0.57609329, 0.03803774, 0.72383508,
0.01058828, 0.09663294, 0.03821498, 0.03189924, 0.02291746,
0.96549119, 0.03520676, 0.81484736, 0.04929659, 0.61780602,
0.04657886, 0.18263127, 0.0640945 , 0.48177414, 0.07194573]),
'EO-3': array([ 0.80433496, -0.01191084, 0.13032773, -0.00898473, 0.94249284,
0.00194162, 0.65633694, 0.01762471, 0.00499136, 0.01741947,
0.90911509, 0.01401391, 0.9364363 , 0.08355142, 0.73622325,
0.07384343, 0.23931754, 0.09790633, 0.3787856 , 0.11439878])}
Other than compatibility with MARL algorithms, the main benefit of the PettingZoo API is that it allows for individual agents to fail without terminating the entire environment.
[15]:
# Immediately kill satellite 0
env.unwrapped.satellites[0].is_alive = isnt_alive
env.agents
[15]:
['EO-1', 'EO-2', 'EO-3']
[16]:
observation, reward, terminated, truncated, info = env.step({
env.agents[0]: 7,
env.agents[1]: 9,
}
)
2025-06-20 19:56:34,203 gym INFO <356.00> === STARTING STEP ===
2025-06-20 19:56:34,203 sats.satellite.EO-1 INFO <356.00> EO-1: target index 7 tasked
2025-06-20 19:56:34,204 sats.satellite.EO-1 INFO <356.00> EO-1: Target(tgt-341) tasked for imaging
2025-06-20 19:56:34,206 sats.satellite.EO-1 INFO <356.00> EO-1: Target(tgt-341) window enabled: 996.7 to 1122.3
2025-06-20 19:56:34,206 sats.satellite.EO-1 INFO <356.00> EO-1: setting timed terminal event at 1122.3
2025-06-20 19:56:34,207 sats.satellite.EO-2 INFO <356.00> EO-2: target index 9 tasked
2025-06-20 19:56:34,207 sats.satellite.EO-2 INFO <356.00> EO-2: Target(tgt-85) tasked for imaging
2025-06-20 19:56:34,209 sats.satellite.EO-2 INFO <356.00> EO-2: Target(tgt-85) window enabled: 766.1 to 974.0
2025-06-20 19:56:34,209 sats.satellite.EO-2 INFO <356.00> EO-2: setting timed terminal event at 974.0
2025-06-20 19:56:34,275 sats.satellite.EO-3 INFO <459.00> EO-3: imaged Target(tgt-294)
2025-06-20 19:56:34,277 data.base INFO <459.00> Total reward: {'EO-3': 0.6563369417104886}
2025-06-20 19:56:34,279 sats.satellite.EO-3 INFO <459.00> EO-3: Satellite EO-3 requires retasking
2025-06-20 19:56:34,283 gym INFO <459.00> Step reward: {'EO-1': -1.0, 'EO-3': 0.6563369417104886}
2025-06-20 19:56:34,284 gym INFO <459.00> Episode terminated: ['EO-1']