Multi-Agent Environments
Two multiagent environments are given in the package:
GeneralSatelliteTasking, a Gymnasium-based environment and the basis for all other environments.
ConstellationTasking, which implements the PettingZoo parallel API.
The latter is preferable for multi-agent RL (MARL) settings, as most algorithms are designed for this kind of API.
Configuring the Environment
For this example, a multisatellite target imaging environment will be used. The goal is to maximize the value of unique images taken.
As usual, the satellite type is defined first.
[1]:
from bsk_rl import sats, act, obs, scene, data, comm
from bsk_rl.sim import dyn, fsw
class ImagingSatellite(sats.ImagingSatellite):
observation_spec = [
obs.OpportunityProperties(
dict(prop="priority"),
dict(prop="opportunity_open", norm=5700.0),
n_ahead_observe=10,
)
]
action_spec = [act.Image(n_ahead_image=10)]
dyn_type = dyn.FullFeaturedDynModel
fsw_type = fsw.SteeringImagerFSWModel
Satellite properties are set to give the satellite near-unlimited power and storage resources. To randomize some parameters in a correlated manner across satellites, a sat_arg_randomizer
is set and passed to the environment. In this case, the satellites are distributed in a trivial single-plane Walker-delta constellation.
[2]:
from bsk_rl.utils.orbital import walker_delta_args
sat_args = dict(
imageAttErrorRequirement=0.01,
imageRateErrorRequirement=0.01,
batteryStorageCapacity=1e9,
storedCharge_Init=1e9,
dataStorageCapacity=1e12,
u_max=0.4,
K1=0.25,
K3=3.0,
omega_max=0.087,
servo_Ki=5.0,
servo_P=150 / 5,
)
sat_arg_randomizer = walker_delta_args(altitude=800.0, inc=60.0, n_planes=1)
Gym API
GeneralSatelliteTasking uses tuples of actions and observations to interact with the environment.
[3]:
from bsk_rl import GeneralSatelliteTasking
env = GeneralSatelliteTasking(
satellites=[
ImagingSatellite("EO-1", sat_args),
ImagingSatellite("EO-2", sat_args),
ImagingSatellite("EO-3", sat_args),
],
scenario=scene.UniformTargets(1000),
rewarder=data.UniqueImageReward(),
communicator=comm.LOSCommunication(), # Note that dyn must inherit from LOSCommunication
sat_arg_randomizer=sat_arg_randomizer,
log_level="INFO",
)
env.reset()
env.observation_space
2025-05-13 19:51:58,050 gym INFO Resetting environment with seed=1158429781
2025-05-13 19:51:58,052 scene.targets INFO Generating 1000 targets
2025-05-13 19:51:58,230 sats.satellite.EO-1 INFO <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2025-05-13 19:51:58,288 sats.satellite.EO-2 INFO <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2025-05-13 19:51:58,326 sats.satellite.EO-3 INFO <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2025-05-13 19:51:58,367 gym INFO <0.00> Environment reset
[3]:
Tuple(Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64))
[4]:
env.action_space
[4]:
Tuple(Discrete(10), Discrete(10), Discrete(10))
Consequently, actions are passed as a tuple. The step will stop the first time any satellite completes an action.
[5]:
observation, reward, terminated, truncated, info = env.step([7, 9, 8])
2025-05-13 19:51:58,382 gym INFO <0.00> === STARTING STEP ===
2025-05-13 19:51:58,383 sats.satellite.EO-1 INFO <0.00> EO-1: target index 7 tasked
2025-05-13 19:51:58,383 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-6) tasked for imaging
2025-05-13 19:51:58,385 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-6) window enabled: 572.6 to 600.0
2025-05-13 19:51:58,385 sats.satellite.EO-1 INFO <0.00> EO-1: setting timed terminal event at 600.0
2025-05-13 19:51:58,387 sats.satellite.EO-2 INFO <0.00> EO-2: target index 9 tasked
2025-05-13 19:51:58,387 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-449) tasked for imaging
2025-05-13 19:51:58,388 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-449) window enabled: 397.4 to 491.9
2025-05-13 19:51:58,388 sats.satellite.EO-2 INFO <0.00> EO-2: setting timed terminal event at 491.9
2025-05-13 19:51:58,390 sats.satellite.EO-3 INFO <0.00> EO-3: target index 8 tasked
2025-05-13 19:51:58,390 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-216) tasked for imaging
2025-05-13 19:51:58,391 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-216) window enabled: 247.1 to 331.4
2025-05-13 19:51:58,392 sats.satellite.EO-3 INFO <0.00> EO-3: setting timed terminal event at 331.4
2025-05-13 19:51:58,538 sats.satellite.EO-3 INFO <250.00> EO-3: imaged Target(tgt-216)
2025-05-13 19:51:58,541 data.base INFO <250.00> Total reward: {'EO-3': 0.2761079670238916}
2025-05-13 19:51:58,545 sats.satellite.EO-3 INFO <250.00> EO-3: Satellite EO-3 requires retasking
2025-05-13 19:51:58,545 sats.satellite.EO-1 INFO <250.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-05-13 19:51:58,589 sats.satellite.EO-2 INFO <250.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-05-13 19:51:58,634 gym INFO <250.00> Step reward: 0.2761079670238916
[6]:
observation
[6]:
(array([ 0.16167598, -0.01061479, 0.62138934, 0.00384541, 0.59435168,
0.0077688 , 0.95552538, 0.05659047, 0.01433835, 0.05036267,
0.20765494, 0.05420718, 0.86913892, 0.0486871 , 0.11416637,
0.08727026, 0.4705042 , 0.07212848, 0.94406731, 0.06430918]),
array([ 1.77491176e-01, -1.57523342e-02, 6.74071688e-01, -2.91070002e-03,
8.11269181e-01, -8.03415709e-04, 8.45224003e-01, 2.58632350e-02,
3.00740141e-01, 2.09824364e-02, 1.28075215e-01, 4.63739246e-02,
2.24502278e-01, 6.09290088e-02, 4.87928737e-01, 7.27485858e-02,
5.87513670e-01, 7.84399608e-02, 3.97813413e-01, 8.68219932e-02]),
array([ 0.28417044, -0.02622175, 0.47416863, -0.02407967, 0.62944009,
0.00528856, 0.16815334, 0.00684775, 0.59171033, 0.0288068 ,
0.41909561, 0.01269788, 0.55108124, 0.03127186, 0.31192754,
0.04735669, 0.37949467, 0.04380215, 0.92703873, 0.03489661]))
At this point, either every satellite can be retasked, or satellites can continue their previous action by passing None
as the action. To see which satellites must be retasked (i.e. their previous action is done and they have nothing more to do), look at "requires_retasking"
in each satellite’s info.
[7]:
info
[7]:
{'EO-1': {'requires_retasking': False},
'EO-2': {'requires_retasking': False},
'EO-3': {'requires_retasking': True},
'd_ts': 250.00000000000003}
Based on this list, we decide here to only retask the satellite that needs it.
[8]:
actions = [0 if info[sat.name]["requires_retasking"] else None for sat in env.unwrapped.satellites]
actions
[8]:
[None, None, 0]
[9]:
observation, reward, terminated, truncated, info = env.step(actions)
2025-05-13 19:51:58,655 gym INFO <250.00> === STARTING STEP ===
2025-05-13 19:51:58,656 sats.satellite.EO-3 INFO <250.00> EO-3: target index 0 tasked
2025-05-13 19:51:58,657 sats.satellite.EO-3 INFO <250.00> EO-3: Target(tgt-820) tasked for imaging
2025-05-13 19:51:58,658 sats.satellite.EO-3 INFO <250.00> EO-3: Target(tgt-820) window enabled: 100.5 to 281.9
2025-05-13 19:51:58,659 sats.satellite.EO-3 INFO <250.00> EO-3: setting timed terminal event at 281.9
2025-05-13 19:51:58,679 sats.satellite.EO-3 INFO <282.00> EO-3: timed termination at 281.9 for Target(tgt-820) window
2025-05-13 19:51:58,681 data.base INFO <282.00> Total reward: {}
2025-05-13 19:51:58,683 sats.satellite.EO-3 INFO <282.00> EO-3: Satellite EO-3 requires retasking
2025-05-13 19:51:58,685 gym INFO <282.00> Step reward: 0.0
In this environment, the environment will stop if any agent dies. To demonstrate this, one satellite is forcibly killed.
[10]:
from Basilisk.architecture import messaging
def isnt_alive(log_failure=False):
"""Mock satellite 0 dying."""
self = env.unwrapped.satellites[0]
death_message = messaging.PowerStorageStatusMsgPayload()
death_message.storageLevel = 0.0
self.dynamics.powerMonitor.batPowerOutMsg.write(death_message)
return self.dynamics.is_alive(log_failure=log_failure) and self.fsw.is_alive(
log_failure=log_failure
)
env.unwrapped.satellites[0].is_alive = isnt_alive
observation, reward, terminated, truncated, info = env.step([6, 7, 9])
2025-05-13 19:51:58,691 gym INFO <282.00> === STARTING STEP ===
2025-05-13 19:51:58,692 sats.satellite.EO-1 INFO <282.00> EO-1: target index 6 tasked
2025-05-13 19:51:58,693 sats.satellite.EO-1 INFO <282.00> EO-1: Target(tgt-191) tasked for imaging
2025-05-13 19:51:58,694 sats.satellite.EO-1 INFO <282.00> EO-1: Target(tgt-191) window enabled: 527.5 to 737.1
2025-05-13 19:51:58,695 sats.satellite.EO-1 INFO <282.00> EO-1: setting timed terminal event at 737.1
2025-05-13 19:51:58,696 sats.satellite.EO-2 INFO <282.00> EO-2: target index 7 tasked
2025-05-13 19:51:58,696 sats.satellite.EO-2 INFO <282.00> EO-2: Target(tgt-990) tasked for imaging
2025-05-13 19:51:58,697 sats.satellite.EO-2 INFO <282.00> EO-2: Target(tgt-990) window enabled: 664.7 to 782.2
2025-05-13 19:51:58,698 sats.satellite.EO-2 INFO <282.00> EO-2: setting timed terminal event at 782.2
2025-05-13 19:51:58,699 sats.satellite.EO-3 INFO <282.00> EO-3: target index 9 tasked
2025-05-13 19:51:58,699 sats.satellite.EO-3 INFO <282.00> EO-3: Target(tgt-842) tasked for imaging
2025-05-13 19:51:58,701 sats.satellite.EO-3 INFO <282.00> EO-3: Target(tgt-842) window enabled: 454.9 to 600.0
2025-05-13 19:51:58,701 sats.satellite.EO-3 INFO <282.00> EO-3: setting timed terminal event at 600.0
2025-05-13 19:51:58,804 sats.satellite.EO-3 INFO <457.00> EO-3: imaged Target(tgt-842)
2025-05-13 19:51:58,807 data.base INFO <457.00> Total reward: {'EO-3': 0.7224180089048656}
2025-05-13 19:51:58,810 sats.satellite.EO-3 INFO <457.00> EO-3: Satellite EO-3 requires retasking
2025-05-13 19:51:58,811 sats.satellite.EO-3 INFO <457.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-05-13 19:51:58,858 sats.satellite.EO-1 WARNING <457.00> EO-1: failed battery_valid check
2025-05-13 19:51:58,860 gym INFO <457.00> Step reward: -0.2775819910951344
2025-05-13 19:51:58,860 gym INFO <457.00> Episode terminated: True
2025-05-13 19:51:58,861 gym INFO <457.00> Episode truncated: False
PettingZoo API
The PettingZoo parallel API environment, ConstellationTasking, is largely the same as GeneralSatelliteTasking. See their documentation for a full description of the API. It tends to separate things into dictionaries keyed by agent, rather than tuples.
[11]:
from bsk_rl import ConstellationTasking
env = ConstellationTasking(
satellites=[
ImagingSatellite("EO-1", sat_args),
ImagingSatellite("EO-2", sat_args),
ImagingSatellite("EO-3", sat_args),
],
scenario=scene.UniformTargets(1000),
rewarder=data.UniqueImageReward(),
communicator=comm.LOSCommunication(), # Note that dyn must inherit from LOSCommunication
sat_arg_randomizer=sat_arg_randomizer,
log_level="INFO",
)
env.reset()
env.observation_spaces
2025-05-13 19:51:58,868 WARNING Creating logger for new env on PID=4451. Old environments in process may now log times incorrectly.
2025-05-13 19:51:58,978 gym INFO Resetting environment with seed=2086307096
2025-05-13 19:51:58,980 scene.targets INFO Generating 1000 targets
2025-05-13 19:51:59,145 sats.satellite.EO-1 INFO <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2025-05-13 19:51:59,187 sats.satellite.EO-2 INFO <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2025-05-13 19:51:59,221 sats.satellite.EO-2 INFO <0.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-05-13 19:51:59,255 sats.satellite.EO-3 INFO <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2025-05-13 19:51:59,297 gym INFO <0.00> Environment reset
[11]:
{'EO-1': Box(-1e+16, 1e+16, (20,), float64),
'EO-2': Box(-1e+16, 1e+16, (20,), float64),
'EO-3': Box(-1e+16, 1e+16, (20,), float64)}
[12]:
env.action_spaces
[12]:
{'EO-1': Discrete(10), 'EO-2': Discrete(10), 'EO-3': Discrete(10)}
Actions are passed as a dictionary; the agent names can be accessed through the agents
property.
[13]:
observation, reward, terminated, truncated, info = env.step(
{
env.agents[0]: 7,
env.agents[1]: 9,
env.agents[2]: 8,
}
)
2025-05-13 19:51:59,310 gym INFO <0.00> === STARTING STEP ===
2025-05-13 19:51:59,310 sats.satellite.EO-1 INFO <0.00> EO-1: target index 7 tasked
2025-05-13 19:51:59,311 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-701) tasked for imaging
2025-05-13 19:51:59,312 sats.satellite.EO-1 INFO <0.00> EO-1: Target(tgt-701) window enabled: 473.5 to 553.5
2025-05-13 19:51:59,313 sats.satellite.EO-1 INFO <0.00> EO-1: setting timed terminal event at 553.5
2025-05-13 19:51:59,314 sats.satellite.EO-2 INFO <0.00> EO-2: target index 9 tasked
2025-05-13 19:51:59,315 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-99) tasked for imaging
2025-05-13 19:51:59,316 sats.satellite.EO-2 INFO <0.00> EO-2: Target(tgt-99) window enabled: 935.9 to 1146.2
2025-05-13 19:51:59,316 sats.satellite.EO-2 INFO <0.00> EO-2: setting timed terminal event at 1146.2
2025-05-13 19:51:59,317 sats.satellite.EO-3 INFO <0.00> EO-3: target index 8 tasked
2025-05-13 19:51:59,318 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-967) tasked for imaging
2025-05-13 19:51:59,319 sats.satellite.EO-3 INFO <0.00> EO-3: Target(tgt-967) window enabled: 298.5 to 384.4
2025-05-13 19:51:59,319 sats.satellite.EO-3 INFO <0.00> EO-3: setting timed terminal event at 384.4
2025-05-13 19:51:59,493 sats.satellite.EO-3 INFO <301.00> EO-3: imaged Target(tgt-967)
2025-05-13 19:51:59,495 data.base INFO <301.00> Total reward: {'EO-3': 0.929968503610453}
2025-05-13 19:51:59,500 sats.satellite.EO-3 INFO <301.00> EO-3: Satellite EO-3 requires retasking
2025-05-13 19:51:59,501 sats.satellite.EO-3 INFO <301.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-05-13 19:51:59,552 gym INFO <301.00> Step reward: {'EO-3': 0.929968503610453}
[14]:
observation
[14]:
{'EO-1': array([ 0.737207 , -0.02104242, 0.77729501, -0.01626501, 0.07636467,
0.00980433, 0.69623653, 0.03025835, 0.16477864, 0.03159164,
0.68221313, 0.0159778 , 0.29857745, 0.01221113, 0.25684356,
0.03605856, 0.00258467, 0.02830186, 0.62608001, 0.02402943]),
'EO-2': array([ 0.18828399, -0.03075776, 0.60157955, -0.02476508, 0.40148461,
-0.0068506 , 0.01765723, -0.00088783, 0.4369406 , 0.01965621,
0.10442904, 0.08696116, 0.50538088, 0.11426408, 0.43935914,
0.11138195, 0.79494658, 0.13762417, 0.06656677, 0.14503812]),
'EO-3': array([ 0.86824082, -0.01278442, 0.85043258, -0.01474056, 0.48860543,
0.00993235, 0.75072653, 0.01348728, 0.72907095, 0.01599991,
0.90853984, 0.02271811, 0.67557123, 0.0336629 , 0.49272773,
0.05548575, 0.3634225 , 0.05776182, 0.70156951, 0.0805398 ])}
Other than compatibility with MARL algorithms, the main benefit of the PettingZoo API is that it allows for individual agents to fail without terminating the entire environment.
[15]:
# Immediately kill satellite 0
env.unwrapped.satellites[0].is_alive = isnt_alive
env.agents
[15]:
['EO-1', 'EO-2', 'EO-3']
[16]:
observation, reward, terminated, truncated, info = env.step({
env.agents[0]: 7,
env.agents[1]: 9,
}
)
2025-05-13 19:51:59,567 gym INFO <301.00> === STARTING STEP ===
2025-05-13 19:51:59,568 sats.satellite.EO-1 INFO <301.00> EO-1: target index 7 tasked
2025-05-13 19:51:59,568 sats.satellite.EO-1 INFO <301.00> EO-1: Target(tgt-926) tasked for imaging
2025-05-13 19:51:59,570 sats.satellite.EO-1 INFO <301.00> EO-1: Target(tgt-926) window enabled: 506.5 to 595.4
2025-05-13 19:51:59,570 sats.satellite.EO-1 INFO <301.00> EO-1: setting timed terminal event at 595.4
2025-05-13 19:51:59,571 sats.satellite.EO-2 INFO <301.00> EO-2: target index 9 tasked
2025-05-13 19:51:59,571 sats.satellite.EO-2 INFO <301.00> EO-2: Target(tgt-154) tasked for imaging
2025-05-13 19:51:59,573 sats.satellite.EO-2 INFO <301.00> EO-2: Target(tgt-154) window enabled: 1127.7 to 1200.0
2025-05-13 19:51:59,574 sats.satellite.EO-2 INFO <301.00> EO-2: setting timed terminal event at 1200.0
2025-05-13 19:51:59,575 sats.satellite.EO-3 WARNING <301.00> EO-3: Requires retasking but received no task.
2025-05-13 19:51:59,625 sats.satellite.EO-3 INFO <385.00> EO-3: timed termination at 384.4 for Target(tgt-967) window
2025-05-13 19:51:59,627 data.base INFO <385.00> Total reward: {}
2025-05-13 19:51:59,629 sats.satellite.EO-3 INFO <385.00> EO-3: Satellite EO-3 requires retasking
2025-05-13 19:51:59,632 gym INFO <385.00> Step reward: {'EO-1': -1.0}
2025-05-13 19:51:59,633 gym INFO <385.00> Episode terminated: ['EO-1']