Multi-Agent Environments
Two multiagent environments are given in the package:
GeneralSatelliteTasking, a Gymnasium-based environment and the basis for all other environments.
ConstellationTasking, which implements the PettingZoo parallel API.
The latter is preferable for multi-agent RL (MARL) settings, as most algorithms are designed for this kind of API.
Configuring the Environment
For this example, a multisatellite target imaging environment will be used. The goal is to maximize the value of unique images taken.
As usual, the satellite type is defined first.
[1]:
from bsk_rl import sats, act, obs, scene, data, comm
from bsk_rl.sim import dyn, fsw
class ImagingSatellite(sats.ImagingSatellite):
observation_spec = [
obs.OpportunityProperties(
dict(prop="priority"),
dict(prop="opportunity_open", norm=5700.0),
n_ahead_observe=10,
)
]
action_spec = [act.Image(n_ahead_image=10)]
dyn_type = dyn.FullFeaturedDynModel
fsw_type = fsw.SteeringImagerFSWModel
Satellite properties are set to give the satellite near-unlimited power and storage resources. To randomize some parameters in a correlated manner across satellites, a sat_arg_randomizer
is set and passed to the environment. In this case, the satellites are distributed in a trivial single-plane Walker-delta constellation.
[2]:
from bsk_rl.utils.orbital import walker_delta_args
sat_args = dict(
imageAttErrorRequirement=0.01,
imageRateErrorRequirement=0.01,
batteryStorageCapacity=1e9,
storedCharge_Init=1e9,
dataStorageCapacity=1e12,
u_max=0.4,
K1=0.25,
K3=3.0,
omega_max=0.087,
servo_Ki=5.0,
servo_P=150 / 5,
)
sat_arg_randomizer = walker_delta_args(altitude=800.0, inc=60.0, n_planes=1)
Gym API
GeneralSatelliteTasking uses tuples of actions and observations to interact with the environment.
[3]:
from bsk_rl import GeneralSatelliteTasking
env = GeneralSatelliteTasking(
satellites=[
ImagingSatellite("EO-1", sat_args),
ImagingSatellite("EO-2", sat_args),
ImagingSatellite("EO-3", sat_args),
],
scenario=scene.UniformTargets(1000),
rewarder=data.UniqueImageReward(),
communicator=comm.LOSCommunication(), # Note that dyn must inherit from LOSCommunication
sat_arg_randomizer=sat_arg_randomizer,
log_level="INFO",
)
env.reset()
env.observation_space
2025-07-02 01:14:01,538 gym INFO Resetting environment with seed=785493642
2025-07-02 01:14:01,540 scene.targets INFO Generating 1000 targets
2025-07-02 01:14:01,718 sats.satellite.EO-1 INFO <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2025-07-02 01:14:01,761 sats.satellite.EO-2 INFO <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2025-07-02 01:14:01,796 sats.satellite.EO-3 INFO <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2025-07-02 01:14:01,835 gym INFO <0.00> Environment reset
[3]:
Tuple(Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64))
[4]:
env.action_space
[4]:
Tuple(Discrete(10), Discrete(10), Discrete(10))
Consequently, actions are passed as a tuple. The step will stop the first time any satellite completes an action.
[5]:
observation, reward, terminated, truncated, info = env.step([7, 9, 8])
2025-07-02 01:14:01,849 gym INFO <0.00> === STARTING STEP ===
2025-07-02 01:14:01,850 sats.satellite.EO-1 INFO <0.00> EO-1: target index 7 tasked
2025-07-02 01:14:01,851 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-714) tasked for imaging
2025-07-02 01:14:01,853 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-714) window enabled: 341.6 to 551.4
2025-07-02 01:14:01,853 sats.satellite.EO-1 INFO <0.00> EO-1: setting timed terminal event at 551.4
2025-07-02 01:14:01,854 sats.satellite.EO-2 INFO <0.00> EO-2: target index 9 tasked
2025-07-02 01:14:01,855 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-751) tasked for imaging
2025-07-02 01:14:01,856 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-751) window enabled: 590.3 to 600.0
2025-07-02 01:14:01,856 sats.satellite.EO-2 INFO <0.00> EO-2: setting timed terminal event at 600.0
2025-07-02 01:14:01,857 sats.satellite.EO-3 INFO <0.00> EO-3: target index 8 tasked
2025-07-02 01:14:01,858 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-299) tasked for imaging
2025-07-02 01:14:01,859 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-299) window enabled: 152.3 to 339.7
2025-07-02 01:14:01,859 sats.satellite.EO-3 INFO <0.00> EO-3: setting timed terminal event at 339.7
2025-07-02 01:14:01,952 sats.satellite.EO-3 INFO <155.00> EO-3: imaged Target(tgt-299)
2025-07-02 01:14:01,955 data.base INFO <155.00> Total reward: {'EO-3': 0.6416072844447908}
2025-07-02 01:14:01,958 sats.satellite.EO-3 INFO <155.00> EO-3: Satellite EO-3 requires retasking
2025-07-02 01:14:01,959 sats.satellite.EO-2 INFO <155.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-07-02 01:14:02,006 gym INFO <155.00> Step reward: 0.6416072844447908
[6]:
observation
[6]:
(array([ 0.58394522, 0.00583088, 0.5486333 , -0.00678864, 0.27296251,
0.0156707 , 0.89004813, 0.00503433, 0.59759937, 0.03143374,
0.14468022, 0.02643256, 0.61881895, 0.03273163, 0.62542815,
0.05162724, 0.46339269, 0.06782605, 0.89262125, 0.05365276]),
array([ 0.80357229, -0.01461437, 0.72145336, -0.02719298, 0.9079608 ,
-0.01396104, 0.47044368, 0.03246519, 0.07308684, 0.02130962,
0.5905582 , 0.07637092, 0.11960851, 0.09812812, 0.33200891,
0.09103419, 0.96817163, 0.1072187 , 0.49443086, 0.11449962]),
array([ 0.9455274 , -0.02719298, 0.974632 , -0.01167049, 0.878912 ,
0.0163456 , 0.51664239, 0.01027752, 0.71863899, -0.00267036,
0.2423598 , 0.01463068, 0.82983575, 0.01421986, 0.13975381,
0.02521686, 0.76814849, 0.03082087, 0.15399547, 0.04330934]))
At this point, either every satellite can be retasked, or satellites can continue their previous action by passing None
as the action. To see which satellites must be retasked (i.e. their previous action is done and they have nothing more to do), look at "requires_retasking"
in each satellite’s info.
[7]:
info
[7]:
{'EO-1': {'requires_retasking': False},
'EO-2': {'requires_retasking': False},
'EO-3': {'requires_retasking': True},
'd_ts': 155.0}
Based on this list, we decide here to only retask the satellite that needs it.
[8]:
actions = [0 if info[sat.name]["requires_retasking"] else None for sat in env.unwrapped.satellites]
actions
[8]:
[None, None, 0]
[9]:
observation, reward, terminated, truncated, info = env.step(actions)
2025-07-02 01:14:02,027 gym INFO <155.00> === STARTING STEP ===
2025-07-02 01:14:02,029 sats.satellite.EO-3 INFO <155.00> EO-3: target index 0 tasked
2025-07-02 01:14:02,029 sats.satellite.EO-3 INFO <155.00> EO-3: Target(tgt-496) tasked for imaging
2025-07-02 01:14:02,031 sats.satellite.EO-3 INFO <155.00> EO-3: Target(tgt-496) window enabled: 0.0 to 196.2
2025-07-02 01:14:02,031 sats.satellite.EO-3 INFO <155.00> EO-3: setting timed terminal event at 196.2
2025-07-02 01:14:02,059 sats.satellite.EO-3 INFO <197.00> EO-3: timed termination at 196.2 for Target(tgt-496) window
2025-07-02 01:14:02,062 data.base INFO <197.00> Total reward: {}
2025-07-02 01:14:02,063 sats.satellite.EO-3 INFO <197.00> EO-3: Satellite EO-3 requires retasking
2025-07-02 01:14:02,066 gym INFO <197.00> Step reward: 0.0
In this environment, the environment will stop if any agent dies. To demonstrate this, one satellite is forcibly killed.
[10]:
from Basilisk.architecture import messaging
def isnt_alive(log_failure=False):
"""Mock satellite 0 dying."""
self = env.unwrapped.satellites[0]
death_message = messaging.PowerStorageStatusMsgPayload()
death_message.storageLevel = 0.0
self.dynamics.powerMonitor.batPowerOutMsg.write(death_message)
return self.dynamics.is_alive(log_failure=log_failure) and self.fsw.is_alive(
log_failure=log_failure
)
env.unwrapped.satellites[0].is_alive = isnt_alive
observation, reward, terminated, truncated, info = env.step([6, 7, 9])
2025-07-02 01:14:02,072 gym INFO <197.00> === STARTING STEP ===
2025-07-02 01:14:02,072 sats.satellite.EO-1 INFO <197.00> EO-1: target index 6 tasked
2025-07-02 01:14:02,073 sats.satellite.EO-1 INFO <197.00> EO-1: Target(tgt-714) window enabled: 341.6 to 551.4
2025-07-02 01:14:02,074 sats.satellite.EO-1 INFO <197.00> EO-1: setting timed terminal event at 551.4
2025-07-02 01:14:02,075 sats.satellite.EO-2 INFO <197.00> EO-2: target index 7 tasked
2025-07-02 01:14:02,075 sats.satellite.EO-2 INFO <197.00> EO-2: Target(tgt-655) tasked for imaging
2025-07-02 01:14:02,077 sats.satellite.EO-2 INFO <197.00> EO-2: Target(tgt-655) window enabled: 807.6 to 961.2
2025-07-02 01:14:02,078 sats.satellite.EO-2 INFO <197.00> EO-2: setting timed terminal event at 961.2
2025-07-02 01:14:02,078 sats.satellite.EO-3 INFO <197.00> EO-3: target index 9 tasked
2025-07-02 01:14:02,079 sats.satellite.EO-3 INFO <197.00> EO-3: Target(tgt-307) tasked for imaging
2025-07-02 01:14:02,080 sats.satellite.EO-3 INFO <197.00> EO-3: Target(tgt-307) window enabled: 409.0 to 600.0
2025-07-02 01:14:02,080 sats.satellite.EO-3 INFO <197.00> EO-3: setting timed terminal event at 600.0
2025-07-02 01:14:02,167 sats.satellite.EO-1 INFO <344.00> EO-1: imaged Target(tgt-714)
2025-07-02 01:14:02,170 data.base INFO <344.00> Total reward: {'EO-1': 0.618818952080729}
2025-07-02 01:14:02,173 sats.satellite.EO-1 INFO <344.00> EO-1: Satellite EO-1 requires retasking
2025-07-02 01:14:02,173 sats.satellite.EO-1 INFO <344.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-07-02 01:14:02,212 sats.satellite.EO-3 INFO <344.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-07-02 01:14:02,251 sats.satellite.EO-1 WARNING <344.00> EO-1: failed battery_valid check
2025-07-02 01:14:02,252 gym INFO <344.00> Step reward: -0.381181047919271
2025-07-02 01:14:02,253 gym INFO <344.00> Episode terminated: True
2025-07-02 01:14:02,253 gym INFO <344.00> Episode truncated: False
PettingZoo API
The PettingZoo parallel API environment, ConstellationTasking, is largely the same as GeneralSatelliteTasking. See their documentation for a full description of the API. It tends to separate things into dictionaries keyed by agent, rather than tuples.
[11]:
from bsk_rl import ConstellationTasking
env = ConstellationTasking(
satellites=[
ImagingSatellite("EO-1", sat_args),
ImagingSatellite("EO-2", sat_args),
ImagingSatellite("EO-3", sat_args),
],
scenario=scene.UniformTargets(1000),
rewarder=data.UniqueImageReward(),
communicator=comm.LOSCommunication(), # Note that dyn must inherit from LOSCommunication
sat_arg_randomizer=sat_arg_randomizer,
log_level="INFO",
)
env.reset()
env.observation_spaces
2025-07-02 01:14:02,262 WARNING Creating logger for new env on PID=4938. Old environments in process may now log times incorrectly.
2025-07-02 01:14:02,381 gym INFO Resetting environment with seed=1628301083
2025-07-02 01:14:02,383 scene.targets INFO Generating 1000 targets
2025-07-02 01:14:02,770 sats.satellite.EO-1 INFO <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2025-07-02 01:14:02,808 sats.satellite.EO-2 INFO <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2025-07-02 01:14:02,848 sats.satellite.EO-3 INFO <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2025-07-02 01:14:02,886 gym INFO <0.00> Environment reset
[11]:
{'EO-1': Box(-1e+16, 1e+16, (20,), float64),
'EO-2': Box(-1e+16, 1e+16, (20,), float64),
'EO-3': Box(-1e+16, 1e+16, (20,), float64)}
[12]:
env.action_spaces
[12]:
{'EO-1': Discrete(10), 'EO-2': Discrete(10), 'EO-3': Discrete(10)}
Actions are passed as a dictionary; the agent names can be accessed through the agents
property.
[13]:
observation, reward, terminated, truncated, info = env.step(
{
env.agents[0]: 7,
env.agents[1]: 9,
env.agents[2]: 8,
}
)
2025-07-02 01:14:02,899 gym INFO <0.00> === STARTING STEP ===
2025-07-02 01:14:02,900 sats.satellite.EO-1 INFO <0.00> EO-1: target index 7 tasked
2025-07-02 01:14:02,900 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-993) tasked for imaging
2025-07-02 01:14:02,902 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-993) window enabled: 215.0 to 367.8
2025-07-02 01:14:02,902 sats.satellite.EO-1 INFO <0.00> EO-1: setting timed terminal event at 367.8
2025-07-02 01:14:02,903 sats.satellite.EO-2 INFO <0.00> EO-2: target index 9 tasked
2025-07-02 01:14:02,904 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-802) tasked for imaging
2025-07-02 01:14:02,905 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-802) window enabled: 69.4 to 271.8
2025-07-02 01:14:02,906 sats.satellite.EO-2 INFO <0.00> EO-2: setting timed terminal event at 271.8
2025-07-02 01:14:02,906 sats.satellite.EO-3 INFO <0.00> EO-3: target index 8 tasked
2025-07-02 01:14:02,907 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-644) tasked for imaging
2025-07-02 01:14:02,908 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-644) window enabled: 270.8 to 479.2
2025-07-02 01:14:02,909 sats.satellite.EO-3 INFO <0.00> EO-3: setting timed terminal event at 479.2
2025-07-02 01:14:02,959 sats.satellite.EO-2 INFO <84.00> EO-2: imaged Target(tgt-802)
2025-07-02 01:14:02,962 data.base INFO <84.00> Total reward: {'EO-2': 0.5587428343768309}
2025-07-02 01:14:02,964 sats.satellite.EO-2 INFO <84.00> EO-2: Satellite EO-2 requires retasking
2025-07-02 01:14:02,968 gym INFO <84.00> Step reward: {'EO-2': 0.5587428343768309}
[14]:
observation
[14]:
{'EO-1': array([ 0.09591815, -0.01473684, 0.42148247, -0.0029456 , 0.66269136,
-0.00932026, 0.65553264, -0.00560586, 0.99350816, 0.00207762,
0.33722587, 0.00698426, 0.62635392, 0.02298049, 0.217451 ,
0.03868663, 0.01685989, 0.03910713, 0.01458292, 0.0601233 ]),
'EO-2': array([ 0.4181617 , -0.01306251, 0.90918904, -0.00632888, 0.49289212,
0.00970571, 0.08100205, -0.01222841, 0.47825195, 0.01631134,
0.37923503, 0.01095837, 0.63300146, 0.01960466, 0.9750327 ,
0.05741442, 0.54458257, 0.05644583, 0.66408997, 0.08990315]),
'EO-3': array([ 1.36628387e-01, -1.47368421e-02, 1.53926527e-02, -1.47368421e-02,
3.35413028e-01, -1.00805440e-02, 5.33949035e-01, -2.69716742e-04,
4.82181929e-01, 2.82896905e-02, 2.66839754e-01, 3.24077274e-02,
6.50650016e-02, 3.27743112e-02, 5.40025470e-01, 3.64598516e-02,
7.03223645e-01, 5.40697625e-02, 6.66980692e-02, 5.25911620e-02])}
Other than compatibility with MARL algorithms, the main benefit of the PettingZoo API is that it allows for individual agents to fail without terminating the entire environment.
[15]:
# Immediately kill satellite 0
env.unwrapped.satellites[0].is_alive = isnt_alive
env.agents
[15]:
['EO-1', 'EO-2', 'EO-3']
[16]:
observation, reward, terminated, truncated, info = env.step({
env.agents[0]: 7,
env.agents[1]: 9,
}
)
2025-07-02 01:14:02,984 gym INFO <84.00> === STARTING STEP ===
2025-07-02 01:14:02,985 sats.satellite.EO-1 INFO <84.00> EO-1: target index 7 tasked
2025-07-02 01:14:02,985 sats.satellite.EO-1 INFO <84.00> EO-1: Target(tgt-721) tasked for imaging
2025-07-02 01:14:02,987 sats.satellite.EO-1 INFO <84.00> EO-1: Target(tgt-721) window enabled: 304.5 to 439.0
2025-07-02 01:14:02,988 sats.satellite.EO-1 INFO <84.00> EO-1: setting timed terminal event at 439.0
2025-07-02 01:14:02,988 sats.satellite.EO-2 INFO <84.00> EO-2: target index 9 tasked
2025-07-02 01:14:02,989 sats.satellite.EO-2 INFO <84.00> EO-2: Target(tgt-878) tasked for imaging
2025-07-02 01:14:02,990 sats.satellite.EO-2 INFO <84.00> EO-2: Target(tgt-878) window enabled: 596.4 to 600.0
2025-07-02 01:14:02,991 sats.satellite.EO-2 INFO <84.00> EO-2: setting timed terminal event at 600.0
2025-07-02 01:14:03,108 sats.satellite.EO-3 INFO <273.00> EO-3: imaged Target(tgt-644)
2025-07-02 01:14:03,112 data.base INFO <273.00> Total reward: {'EO-3': 0.06506500160878925}
2025-07-02 01:14:03,116 sats.satellite.EO-3 INFO <273.00> EO-3: Satellite EO-3 requires retasking
2025-07-02 01:14:03,117 sats.satellite.EO-1 INFO <273.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-07-02 01:14:03,155 sats.satellite.EO-2 INFO <273.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-07-02 01:14:03,201 sats.access_satellite WARNING <273.00> initial_generation_duration is shorter than the maximum window length; some windows may be neglected.
2025-07-02 01:14:03,205 gym INFO <273.00> Step reward: {'EO-1': -1.0, 'EO-3': 0.06506500160878925}
2025-07-02 01:14:03,205 gym INFO <273.00> Episode terminated: ['EO-1']