Multi-Agent Environments
Two multiagent environments are given in the package:
GeneralSatelliteTasking, a Gymnasium-based environment and the basis for all other environments.
ConstellationTasking, which implements the PettingZoo parallel API.
The latter is preferable for multi-agent RL (MARL) settings, as most algorithms are designed for this kind of API.
Configuring the Environment
For this example, a multisatellite target imaging environment will be used. The goal is to maximize the value of unique images taken.
As usual, the satellite type is defined first.
[1]:
from bsk_rl import sats, act, obs, scene, data, comm
from bsk_rl.sim import dyn, fsw
class ImagingSatellite(sats.ImagingSatellite):
observation_spec = [
obs.OpportunityProperties(
dict(prop="priority"),
dict(prop="opportunity_open", norm=5700.0),
n_ahead_observe=10,
)
]
action_spec = [act.Image(n_ahead_image=10)]
dyn_type = dyn.FullFeaturedDynModel
fsw_type = fsw.SteeringImagerFSWModel
Satellite properties are set to give the satellite near-unlimited power and storage resources. To randomize some parameters in a correlated manner across satellites, a sat_arg_randomizer
is set and passed to the environment. In this case, the satellites are distributed in a trivial single-plane Walker-delta constellation.
[2]:
from bsk_rl.utils.orbital import walker_delta_args
sat_args = dict(
imageAttErrorRequirement=0.01,
imageRateErrorRequirement=0.01,
batteryStorageCapacity=1e9,
storedCharge_Init=1e9,
dataStorageCapacity=1e12,
u_max=0.4,
K1=0.25,
K3=3.0,
omega_max=0.087,
servo_Ki=5.0,
servo_P=150 / 5,
)
sat_arg_randomizer = walker_delta_args(altitude=800.0, inc=60.0, n_planes=1)
Gym API
GeneralSatelliteTasking uses tuples of actions and observations to interact with the environment.
[3]:
from bsk_rl import GeneralSatelliteTasking
env = GeneralSatelliteTasking(
satellites=[
ImagingSatellite("EO-1", sat_args),
ImagingSatellite("EO-2", sat_args),
ImagingSatellite("EO-3", sat_args),
],
scenario=scene.UniformTargets(1000),
rewarder=data.UniqueImageReward(),
communicator=comm.LOSCommunication(), # Note that dyn must inherit from LOSCommunication
sat_arg_randomizer=sat_arg_randomizer,
log_level="INFO",
)
env.reset()
env.observation_space
2025-05-09 15:45:44,821 gym INFO Resetting environment with seed=1455686062
2025-05-09 15:45:44,824 scene.targets INFO Generating 1000 targets
2025-05-09 15:45:45,000 sats.satellite.EO-1 INFO <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2025-05-09 15:45:45,043 sats.satellite.EO-2 INFO <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2025-05-09 15:45:45,082 sats.satellite.EO-3 INFO <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2025-05-09 15:45:45,127 gym INFO <0.00> Environment reset
[3]:
Tuple(Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64))
[4]:
env.action_space
[4]:
Tuple(Discrete(10), Discrete(10), Discrete(10))
Consequently, actions are passed as a tuple. The step will stop the first time any satellite completes an action.
[5]:
observation, reward, terminated, truncated, info = env.step([7, 9, 8])
2025-05-09 15:45:45,141 gym INFO <0.00> === STARTING STEP ===
2025-05-09 15:45:45,142 sats.satellite.EO-1 INFO <0.00> EO-1: target index 7 tasked
2025-05-09 15:45:45,143 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-662) tasked for imaging
2025-05-09 15:45:45,145 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-662) window enabled: 95.0 to 295.0
2025-05-09 15:45:45,145 sats.satellite.EO-1 INFO <0.00> EO-1: setting timed terminal event at 295.0
2025-05-09 15:45:45,146 sats.satellite.EO-2 INFO <0.00> EO-2: target index 9 tasked
2025-05-09 15:45:45,147 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-0) tasked for imaging
2025-05-09 15:45:45,148 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-0) window enabled: 443.6 to 600.0
2025-05-09 15:45:45,149 sats.satellite.EO-2 INFO <0.00> EO-2: setting timed terminal event at 600.0
2025-05-09 15:45:45,149 sats.satellite.EO-3 INFO <0.00> EO-3: target index 8 tasked
2025-05-09 15:45:45,150 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-238) tasked for imaging
2025-05-09 15:45:45,152 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-238) window enabled: 145.5 to 354.1
2025-05-09 15:45:45,152 sats.satellite.EO-3 INFO <0.00> EO-3: setting timed terminal event at 354.1
2025-05-09 15:45:45,214 sats.satellite.EO-1 INFO <98.00> EO-1: imaged Target(tgt-662)
2025-05-09 15:45:45,217 data.base INFO <98.00> Total reward: {'EO-1': 0.18368773379908132}
2025-05-09 15:45:45,220 sats.satellite.EO-1 INFO <98.00> EO-1: Satellite EO-1 requires retasking
2025-05-09 15:45:45,223 gym INFO <98.00> Step reward: 0.18368773379908132
[6]:
observation
[6]:
(array([ 0.43493988, -0.01719298, 0.93024947, -0.01193444, 0.55711046,
-0.01043602, 0.45366721, 0.02091902, 0.79078309, 0.03533112,
0.79912284, 0.03392685, 0.02086988, 0.05900815, 0.87937811,
0.04870806, 0.66766077, 0.08051888, 0.01207756, 0.0702947 ]),
array([ 1.50846487e-01, -1.71929825e-02, 1.84335734e-01, -1.48093362e-02,
5.66551402e-01, -1.71929825e-02, 3.77530651e-01, -1.30223943e-02,
2.10867542e-01, 1.25733611e-05, 4.69108023e-02, 3.02228442e-02,
7.05199628e-01, 3.40600884e-02, 4.23723512e-01, 5.74280550e-02,
3.96391125e-02, 6.06354755e-02, 3.23962939e-01, 7.18280246e-02]),
array([ 0.13790261, -0.0053917 , 0.44197374, 0.00319207, 0.83282028,
0.01402158, 0.15242568, 0.02213427, 0.47336888, 0.0083331 ,
0.9191192 , 0.01253269, 0.63752532, 0.02665542, 0.39552946,
0.03608269, 0.36356521, 0.0458701 , 0.86614125, 0.03902758]))
At this point, either every satellite can be retasked, or satellites can continue their previous action by passing None
as the action. To see which satellites must be retasked (i.e. their previous action is done and they have nothing more to do), look at "requires_retasking"
in each satellite’s info.
[7]:
info
[7]:
{'EO-1': {'requires_retasking': True},
'EO-2': {'requires_retasking': False},
'EO-3': {'requires_retasking': False},
'd_ts': 98.0}
Based on this list, we decide here to only retask the satellite that needs it.
[8]:
actions = [0 if info[sat.name]["requires_retasking"] else None for sat in env.unwrapped.satellites]
actions
[8]:
[0, None, None]
[9]:
observation, reward, terminated, truncated, info = env.step(actions)
2025-05-09 15:45:45,245 gym INFO <98.00> === STARTING STEP ===
2025-05-09 15:45:45,245 sats.satellite.EO-1 INFO <98.00> EO-1: target index 0 tasked
2025-05-09 15:45:45,246 sats.satellite.EO-1 INFO <98.00> EO-1: Target(tgt-975) tasked for imaging
2025-05-09 15:45:45,247 sats.satellite.EO-1 INFO <98.00> EO-1: Target(tgt-975) window enabled: 0.0 to 177.3
2025-05-09 15:45:45,248 sats.satellite.EO-1 INFO <98.00> EO-1: setting timed terminal event at 177.3
2025-05-09 15:45:45,282 sats.satellite.EO-3 INFO <148.00> EO-3: imaged Target(tgt-238)
2025-05-09 15:45:45,285 data.base INFO <148.00> Total reward: {'EO-3': 0.4733688801446453}
2025-05-09 15:45:45,286 sats.satellite.EO-3 INFO <148.00> EO-3: Satellite EO-3 requires retasking
2025-05-09 15:45:45,289 gym INFO <148.00> Step reward: 0.4733688801446453
In this environment, the environment will stop if any agent dies. To demonstrate this, one satellite is forcibly killed.
[10]:
from Basilisk.architecture import messaging
def isnt_alive(log_failure=False):
"""Mock satellite 0 dying."""
self = env.unwrapped.satellites[0]
death_message = messaging.PowerStorageStatusMsgPayload()
death_message.storageLevel = 0.0
self.dynamics.powerMonitor.batPowerOutMsg.write(death_message)
return self.dynamics.is_alive(log_failure=log_failure) and self.fsw.is_alive(
log_failure=log_failure
)
env.unwrapped.satellites[0].is_alive = isnt_alive
observation, reward, terminated, truncated, info = env.step([6, 7, 9])
2025-05-09 15:45:45,295 gym INFO <148.00> === STARTING STEP ===
2025-05-09 15:45:45,296 sats.satellite.EO-1 INFO <148.00> EO-1: target index 6 tasked
2025-05-09 15:45:45,297 sats.satellite.EO-1 INFO <148.00> EO-1: Target(tgt-340) tasked for imaging
2025-05-09 15:45:45,298 sats.satellite.EO-1 INFO <148.00> EO-1: Target(tgt-340) window enabled: 434.3 to 570.2
2025-05-09 15:45:45,299 sats.satellite.EO-1 INFO <148.00> EO-1: setting timed terminal event at 570.2
2025-05-09 15:45:45,299 sats.satellite.EO-2 INFO <148.00> EO-2: target index 7 tasked
2025-05-09 15:45:45,300 sats.satellite.EO-2 INFO <148.00> EO-2: Target(tgt-653) tasked for imaging
2025-05-09 15:45:45,301 sats.satellite.EO-2 INFO <148.00> EO-2: Target(tgt-653) window enabled: 425.3 to 535.8
2025-05-09 15:45:45,302 sats.satellite.EO-2 INFO <148.00> EO-2: setting timed terminal event at 535.8
2025-05-09 15:45:45,302 sats.satellite.EO-3 INFO <148.00> EO-3: target index 9 tasked
2025-05-09 15:45:45,303 sats.satellite.EO-3 INFO <148.00> EO-3: Target(tgt-440) tasked for imaging
2025-05-09 15:45:45,304 sats.satellite.EO-3 INFO <148.00> EO-3: Target(tgt-440) window enabled: 375.5 to 540.8
2025-05-09 15:45:45,305 sats.satellite.EO-3 INFO <148.00> EO-3: setting timed terminal event at 540.8
2025-05-09 15:45:45,449 sats.satellite.EO-3 INFO <378.00> EO-3: imaged Target(tgt-440)
2025-05-09 15:45:45,452 data.base INFO <378.00> Total reward: {'EO-3': 0.5603919125673439}
2025-05-09 15:45:45,455 sats.satellite.EO-3 INFO <378.00> EO-3: Satellite EO-3 requires retasking
2025-05-09 15:45:45,456 sats.satellite.EO-1 INFO <378.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-05-09 15:45:45,497 sats.satellite.EO-2 INFO <378.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-05-09 15:45:45,547 sats.satellite.EO-3 INFO <378.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-05-09 15:45:45,584 sats.satellite.EO-1 WARNING <378.00> EO-1: failed battery_valid check
2025-05-09 15:45:45,586 gym INFO <378.00> Step reward: -0.4396080874326561
2025-05-09 15:45:45,587 gym INFO <378.00> Episode terminated: True
2025-05-09 15:45:45,587 gym INFO <378.00> Episode truncated: False
PettingZoo API
The PettingZoo parallel API environment, ConstellationTasking, is largely the same as GeneralSatelliteTasking. See their documentation for a full description of the API. It tends to separate things into dictionaries keyed by agent, rather than tuples.
[11]:
from bsk_rl import ConstellationTasking
env = ConstellationTasking(
satellites=[
ImagingSatellite("EO-1", sat_args),
ImagingSatellite("EO-2", sat_args),
ImagingSatellite("EO-3", sat_args),
],
scenario=scene.UniformTargets(1000),
rewarder=data.UniqueImageReward(),
communicator=comm.LOSCommunication(), # Note that dyn must inherit from LOSCommunication
sat_arg_randomizer=sat_arg_randomizer,
log_level="INFO",
)
env.reset()
env.observation_spaces
2025-05-09 15:45:45,596 WARNING Creating logger for new env on PID=4579. Old environments in process may now log times incorrectly.
2025-05-09 15:45:45,716 gym INFO Resetting environment with seed=2350446873
2025-05-09 15:45:45,718 scene.targets INFO Generating 1000 targets
2025-05-09 15:45:45,885 sats.satellite.EO-1 INFO <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2025-05-09 15:45:45,924 sats.satellite.EO-2 INFO <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2025-05-09 15:45:45,961 sats.satellite.EO-3 INFO <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2025-05-09 15:45:46,001 gym INFO <0.00> Environment reset
[11]:
{'EO-1': Box(-1e+16, 1e+16, (20,), float64),
'EO-2': Box(-1e+16, 1e+16, (20,), float64),
'EO-3': Box(-1e+16, 1e+16, (20,), float64)}
[12]:
env.action_spaces
[12]:
{'EO-1': Discrete(10), 'EO-2': Discrete(10), 'EO-3': Discrete(10)}
Actions are passed as a dictionary; the agent names can be accessed through the agents
property.
[13]:
observation, reward, terminated, truncated, info = env.step(
{
env.agents[0]: 7,
env.agents[1]: 9,
env.agents[2]: 8,
}
)
2025-05-09 15:45:46,014 gym INFO <0.00> === STARTING STEP ===
2025-05-09 15:45:46,015 sats.satellite.EO-1 INFO <0.00> EO-1: target index 7 tasked
2025-05-09 15:45:46,015 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-869) tasked for imaging
2025-05-09 15:45:46,017 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-869) window enabled: 157.4 to 363.0
2025-05-09 15:45:46,018 sats.satellite.EO-1 INFO <0.00> EO-1: setting timed terminal event at 363.0
2025-05-09 15:45:46,019 sats.satellite.EO-2 INFO <0.00> EO-2: target index 9 tasked
2025-05-09 15:45:46,019 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-580) tasked for imaging
2025-05-09 15:45:46,021 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-580) window enabled: 514.3 to 600.0
2025-05-09 15:45:46,021 sats.satellite.EO-2 INFO <0.00> EO-2: setting timed terminal event at 600.0
2025-05-09 15:45:46,022 sats.satellite.EO-3 INFO <0.00> EO-3: target index 8 tasked
2025-05-09 15:45:46,023 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-893) tasked for imaging
2025-05-09 15:45:46,024 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-893) window enabled: 380.6 to 587.5
2025-05-09 15:45:46,024 sats.satellite.EO-3 INFO <0.00> EO-3: setting timed terminal event at 587.5
2025-05-09 15:45:46,126 sats.satellite.EO-1 INFO <160.00> EO-1: imaged Target(tgt-869)
2025-05-09 15:45:46,129 data.base INFO <160.00> Total reward: {'EO-1': 0.699626753016981}
2025-05-09 15:45:46,132 sats.satellite.EO-1 INFO <160.00> EO-1: Satellite EO-1 requires retasking
2025-05-09 15:45:46,133 sats.satellite.EO-1 INFO <160.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-05-09 15:45:46,170 sats.satellite.EO-2 INFO <160.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-05-09 15:45:46,216 gym INFO <160.00> Step reward: {'EO-1': 0.699626753016981}
[14]:
observation
[14]:
{'EO-1': array([ 0.03940157, -0.022577 , 0.4074249 , -0.01184677, 0.31078485,
-0.00511768, 0.15444453, 0.0034657 , 0.15493098, 0.0334931 ,
0.25327144, 0.0600931 , 0.8659869 , 0.09386789, 0.45798376,
0.10036904, 0.65706224, 0.1146261 , 0.31218336, 0.10672817]),
'EO-2': array([ 0.18259187, -0.01897202, 0.55942136, 0.00210687, 0.61906133,
0.0108184 , 0.04700211, -0.00444765, 0.64235501, 0.02428445,
0.752648 , 0.05291728, 0.12972053, 0.05951714, 0.46985677,
0.0523609 , 0.53430127, 0.06216361, 0.95453296, 0.08007677]),
'EO-3': array([ 0.99338866, 0.00993309, 0.76460356, -0.0118841 , 0.07490146,
0.00854242, 0.37868782, 0.02586204, 0.76646804, 0.01625605,
0.70243958, 0.03444775, 0.13864359, 0.02896972, 0.5842263 ,
0.04342319, 0.03858758, 0.03870965, 0.46334271, 0.06231323])}
Other than compatibility with MARL algorithms, the main benefit of the PettingZoo API is that it allows for individual agents to fail without terminating the entire environment.
[15]:
# Immediately kill satellite 0
env.unwrapped.satellites[0].is_alive = isnt_alive
env.agents
[15]:
['EO-1', 'EO-2', 'EO-3']
[16]:
observation, reward, terminated, truncated, info = env.step({
env.agents[0]: 7,
env.agents[1]: 9,
}
)
2025-05-09 15:45:46,233 gym INFO <160.00> === STARTING STEP ===
2025-05-09 15:45:46,234 sats.satellite.EO-1 INFO <160.00> EO-1: target index 7 tasked
2025-05-09 15:45:46,234 sats.satellite.EO-1 INFO <160.00> EO-1: Target(tgt-819) tasked for imaging
2025-05-09 15:45:46,236 sats.satellite.EO-1 INFO <160.00> EO-1: Target(tgt-819) window enabled: 732.1 to 937.2
2025-05-09 15:45:46,236 sats.satellite.EO-1 INFO <160.00> EO-1: setting timed terminal event at 937.2
2025-05-09 15:45:46,237 sats.satellite.EO-2 INFO <160.00> EO-2: target index 9 tasked
2025-05-09 15:45:46,238 sats.satellite.EO-2 INFO <160.00> EO-2: Target(tgt-55) tasked for imaging
2025-05-09 15:45:46,239 sats.satellite.EO-2 INFO <160.00> EO-2: Target(tgt-55) window enabled: 616.4 to 730.4
2025-05-09 15:45:46,239 sats.satellite.EO-2 INFO <160.00> EO-2: setting timed terminal event at 730.4
2025-05-09 15:45:46,372 sats.satellite.EO-3 INFO <383.00> EO-3: imaged Target(tgt-893)
2025-05-09 15:45:46,375 data.base INFO <383.00> Total reward: {'EO-3': 0.03858757678013269}
2025-05-09 15:45:46,378 sats.satellite.EO-3 INFO <383.00> EO-3: Satellite EO-3 requires retasking
2025-05-09 15:45:46,380 sats.satellite.EO-3 INFO <383.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-05-09 15:45:46,424 gym INFO <383.00> Step reward: {'EO-1': -1.0, 'EO-3': 0.03858757678013269}
2025-05-09 15:45:46,425 gym INFO <383.00> Episode terminated: ['EO-1']