Multi-Agent Environments

Two multiagent environments are given in the package:

The latter is preferable for multi-agent RL (MARL) settings, as most algorithms are designed for this kind of API.

Configuring the Environment

For this example, a multisatellite target imaging environment will be used. The goal is to maximize the value of unique images taken.

As usual, the satellite type is defined first.

[1]:
from bsk_rl import sats, act, obs, scene, data, comm
from bsk_rl.sim import dyn, fsw

class ImagingSatellite(sats.ImagingSatellite):
    observation_spec = [
        obs.OpportunityProperties(
            dict(prop="priority"),
            dict(prop="opportunity_open", norm=5700.0),
            n_ahead_observe=10,
        )
    ]
    action_spec = [act.Image(n_ahead_image=10)]
    dyn_type = dyn.FullFeaturedDynModel
    fsw_type = fsw.SteeringImagerFSWModel

Satellite properties are set to give the satellite near-unlimited power and storage resources. To randomize some parameters in a correlated manner across satellites, a sat_arg_randomizer is set and passed to the environment. In this case, the satellites are distributed in a trivial single-plane Walker-delta constellation.

[2]:

from bsk_rl.utils.orbital import walker_delta_args sat_args = dict( imageAttErrorRequirement=0.01, imageRateErrorRequirement=0.01, batteryStorageCapacity=1e9, storedCharge_Init=1e9, dataStorageCapacity=1e12, u_max=0.4, K1=0.25, K3=3.0, omega_max=0.087, servo_Ki=5.0, servo_P=150 / 5, ) sat_arg_randomizer = walker_delta_args(altitude=800.0, inc=60.0, n_planes=1)

Gym API

GeneralSatelliteTasking uses tuples of actions and observations to interact with the environment.

[3]:
from bsk_rl import GeneralSatelliteTasking

env = GeneralSatelliteTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_space
2025-05-09 15:45:44,821 gym                            INFO       Resetting environment with seed=1455686062
2025-05-09 15:45:44,824 scene.targets                  INFO       Generating 1000 targets
2025-05-09 15:45:45,000 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2025-05-09 15:45:45,043 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2025-05-09 15:45:45,082 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2025-05-09 15:45:45,127 gym                            INFO       <0.00> Environment reset
[3]:
Tuple(Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64))
[4]:
env.action_space
[4]:
Tuple(Discrete(10), Discrete(10), Discrete(10))

Consequently, actions are passed as a tuple. The step will stop the first time any satellite completes an action.

[5]:
observation, reward, terminated, truncated, info = env.step([7, 9, 8])
2025-05-09 15:45:45,141 gym                            INFO       <0.00> === STARTING STEP ===
2025-05-09 15:45:45,142 sats.satellite.EO-1            INFO       <0.00> EO-1: target index 7 tasked
2025-05-09 15:45:45,143 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-662) tasked for imaging
2025-05-09 15:45:45,145 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-662) window enabled: 95.0 to 295.0
2025-05-09 15:45:45,145 sats.satellite.EO-1            INFO       <0.00> EO-1: setting timed terminal event at 295.0
2025-05-09 15:45:45,146 sats.satellite.EO-2            INFO       <0.00> EO-2: target index 9 tasked
2025-05-09 15:45:45,147 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-0) tasked for imaging
2025-05-09 15:45:45,148 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-0) window enabled: 443.6 to 600.0
2025-05-09 15:45:45,149 sats.satellite.EO-2            INFO       <0.00> EO-2: setting timed terminal event at 600.0
2025-05-09 15:45:45,149 sats.satellite.EO-3            INFO       <0.00> EO-3: target index 8 tasked
2025-05-09 15:45:45,150 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-238) tasked for imaging
2025-05-09 15:45:45,152 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-238) window enabled: 145.5 to 354.1
2025-05-09 15:45:45,152 sats.satellite.EO-3            INFO       <0.00> EO-3: setting timed terminal event at 354.1
2025-05-09 15:45:45,214 sats.satellite.EO-1            INFO       <98.00> EO-1: imaged Target(tgt-662)
2025-05-09 15:45:45,217 data.base                      INFO       <98.00> Total reward: {'EO-1': 0.18368773379908132}
2025-05-09 15:45:45,220 sats.satellite.EO-1            INFO       <98.00> EO-1: Satellite EO-1 requires retasking
2025-05-09 15:45:45,223 gym                            INFO       <98.00> Step reward: 0.18368773379908132
[6]:
observation
[6]:
(array([ 0.43493988, -0.01719298,  0.93024947, -0.01193444,  0.55711046,
        -0.01043602,  0.45366721,  0.02091902,  0.79078309,  0.03533112,
         0.79912284,  0.03392685,  0.02086988,  0.05900815,  0.87937811,
         0.04870806,  0.66766077,  0.08051888,  0.01207756,  0.0702947 ]),
 array([ 1.50846487e-01, -1.71929825e-02,  1.84335734e-01, -1.48093362e-02,
         5.66551402e-01, -1.71929825e-02,  3.77530651e-01, -1.30223943e-02,
         2.10867542e-01,  1.25733611e-05,  4.69108023e-02,  3.02228442e-02,
         7.05199628e-01,  3.40600884e-02,  4.23723512e-01,  5.74280550e-02,
         3.96391125e-02,  6.06354755e-02,  3.23962939e-01,  7.18280246e-02]),
 array([ 0.13790261, -0.0053917 ,  0.44197374,  0.00319207,  0.83282028,
         0.01402158,  0.15242568,  0.02213427,  0.47336888,  0.0083331 ,
         0.9191192 ,  0.01253269,  0.63752532,  0.02665542,  0.39552946,
         0.03608269,  0.36356521,  0.0458701 ,  0.86614125,  0.03902758]))

At this point, either every satellite can be retasked, or satellites can continue their previous action by passing None as the action. To see which satellites must be retasked (i.e. their previous action is done and they have nothing more to do), look at "requires_retasking" in each satellite’s info.

[7]:
info
[7]:
{'EO-1': {'requires_retasking': True},
 'EO-2': {'requires_retasking': False},
 'EO-3': {'requires_retasking': False},
 'd_ts': 98.0}

Based on this list, we decide here to only retask the satellite that needs it.

[8]:
actions = [0 if info[sat.name]["requires_retasking"] else None for sat in env.unwrapped.satellites]
actions
[8]:
[0, None, None]
[9]:
observation, reward, terminated, truncated, info = env.step(actions)
2025-05-09 15:45:45,245 gym                            INFO       <98.00> === STARTING STEP ===
2025-05-09 15:45:45,245 sats.satellite.EO-1            INFO       <98.00> EO-1: target index 0 tasked
2025-05-09 15:45:45,246 sats.satellite.EO-1            INFO       <98.00> EO-1: Target(tgt-975) tasked for imaging
2025-05-09 15:45:45,247 sats.satellite.EO-1            INFO       <98.00> EO-1: Target(tgt-975) window enabled: 0.0 to 177.3
2025-05-09 15:45:45,248 sats.satellite.EO-1            INFO       <98.00> EO-1: setting timed terminal event at 177.3
2025-05-09 15:45:45,282 sats.satellite.EO-3            INFO       <148.00> EO-3: imaged Target(tgt-238)
2025-05-09 15:45:45,285 data.base                      INFO       <148.00> Total reward: {'EO-3': 0.4733688801446453}
2025-05-09 15:45:45,286 sats.satellite.EO-3            INFO       <148.00> EO-3: Satellite EO-3 requires retasking
2025-05-09 15:45:45,289 gym                            INFO       <148.00> Step reward: 0.4733688801446453

In this environment, the environment will stop if any agent dies. To demonstrate this, one satellite is forcibly killed.

[10]:
from Basilisk.architecture import messaging

def isnt_alive(log_failure=False):
    """Mock satellite 0 dying."""
    self = env.unwrapped.satellites[0]
    death_message = messaging.PowerStorageStatusMsgPayload()
    death_message.storageLevel = 0.0
    self.dynamics.powerMonitor.batPowerOutMsg.write(death_message)
    return self.dynamics.is_alive(log_failure=log_failure) and self.fsw.is_alive(
        log_failure=log_failure
    )

env.unwrapped.satellites[0].is_alive = isnt_alive
observation, reward, terminated, truncated, info = env.step([6, 7, 9])

2025-05-09 15:45:45,295 gym                            INFO       <148.00> === STARTING STEP ===
2025-05-09 15:45:45,296 sats.satellite.EO-1            INFO       <148.00> EO-1: target index 6 tasked
2025-05-09 15:45:45,297 sats.satellite.EO-1            INFO       <148.00> EO-1: Target(tgt-340) tasked for imaging
2025-05-09 15:45:45,298 sats.satellite.EO-1            INFO       <148.00> EO-1: Target(tgt-340) window enabled: 434.3 to 570.2
2025-05-09 15:45:45,299 sats.satellite.EO-1            INFO       <148.00> EO-1: setting timed terminal event at 570.2
2025-05-09 15:45:45,299 sats.satellite.EO-2            INFO       <148.00> EO-2: target index 7 tasked
2025-05-09 15:45:45,300 sats.satellite.EO-2            INFO       <148.00> EO-2: Target(tgt-653) tasked for imaging
2025-05-09 15:45:45,301 sats.satellite.EO-2            INFO       <148.00> EO-2: Target(tgt-653) window enabled: 425.3 to 535.8
2025-05-09 15:45:45,302 sats.satellite.EO-2            INFO       <148.00> EO-2: setting timed terminal event at 535.8
2025-05-09 15:45:45,302 sats.satellite.EO-3            INFO       <148.00> EO-3: target index 9 tasked
2025-05-09 15:45:45,303 sats.satellite.EO-3            INFO       <148.00> EO-3: Target(tgt-440) tasked for imaging
2025-05-09 15:45:45,304 sats.satellite.EO-3            INFO       <148.00> EO-3: Target(tgt-440) window enabled: 375.5 to 540.8
2025-05-09 15:45:45,305 sats.satellite.EO-3            INFO       <148.00> EO-3: setting timed terminal event at 540.8
2025-05-09 15:45:45,449 sats.satellite.EO-3            INFO       <378.00> EO-3: imaged Target(tgt-440)
2025-05-09 15:45:45,452 data.base                      INFO       <378.00> Total reward: {'EO-3': 0.5603919125673439}
2025-05-09 15:45:45,455 sats.satellite.EO-3            INFO       <378.00> EO-3: Satellite EO-3 requires retasking
2025-05-09 15:45:45,456 sats.satellite.EO-1            INFO       <378.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-05-09 15:45:45,497 sats.satellite.EO-2            INFO       <378.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-05-09 15:45:45,547 sats.satellite.EO-3            INFO       <378.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-05-09 15:45:45,584 sats.satellite.EO-1            WARNING    <378.00> EO-1: failed battery_valid check
2025-05-09 15:45:45,586 gym                            INFO       <378.00> Step reward: -0.4396080874326561
2025-05-09 15:45:45,587 gym                            INFO       <378.00> Episode terminated: True
2025-05-09 15:45:45,587 gym                            INFO       <378.00> Episode truncated: False

PettingZoo API

The PettingZoo parallel API environment, ConstellationTasking, is largely the same as GeneralSatelliteTasking. See their documentation for a full description of the API. It tends to separate things into dictionaries keyed by agent, rather than tuples.

[11]:
from bsk_rl import ConstellationTasking

env = ConstellationTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_spaces
2025-05-09 15:45:45,596                                WARNING    Creating logger for new env on PID=4579. Old environments in process may now log times incorrectly.
2025-05-09 15:45:45,716 gym                            INFO       Resetting environment with seed=2350446873
2025-05-09 15:45:45,718 scene.targets                  INFO       Generating 1000 targets
2025-05-09 15:45:45,885 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2025-05-09 15:45:45,924 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2025-05-09 15:45:45,961 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2025-05-09 15:45:46,001 gym                            INFO       <0.00> Environment reset
[11]:
{'EO-1': Box(-1e+16, 1e+16, (20,), float64),
 'EO-2': Box(-1e+16, 1e+16, (20,), float64),
 'EO-3': Box(-1e+16, 1e+16, (20,), float64)}
[12]:
env.action_spaces
[12]:
{'EO-1': Discrete(10), 'EO-2': Discrete(10), 'EO-3': Discrete(10)}

Actions are passed as a dictionary; the agent names can be accessed through the agents property.

[13]:
observation, reward, terminated, truncated, info = env.step(
    {
        env.agents[0]: 7,
        env.agents[1]: 9,
        env.agents[2]: 8,
    }
)
2025-05-09 15:45:46,014 gym                            INFO       <0.00> === STARTING STEP ===
2025-05-09 15:45:46,015 sats.satellite.EO-1            INFO       <0.00> EO-1: target index 7 tasked
2025-05-09 15:45:46,015 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-869) tasked for imaging
2025-05-09 15:45:46,017 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-869) window enabled: 157.4 to 363.0
2025-05-09 15:45:46,018 sats.satellite.EO-1            INFO       <0.00> EO-1: setting timed terminal event at 363.0
2025-05-09 15:45:46,019 sats.satellite.EO-2            INFO       <0.00> EO-2: target index 9 tasked
2025-05-09 15:45:46,019 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-580) tasked for imaging
2025-05-09 15:45:46,021 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-580) window enabled: 514.3 to 600.0
2025-05-09 15:45:46,021 sats.satellite.EO-2            INFO       <0.00> EO-2: setting timed terminal event at 600.0
2025-05-09 15:45:46,022 sats.satellite.EO-3            INFO       <0.00> EO-3: target index 8 tasked
2025-05-09 15:45:46,023 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-893) tasked for imaging
2025-05-09 15:45:46,024 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-893) window enabled: 380.6 to 587.5
2025-05-09 15:45:46,024 sats.satellite.EO-3            INFO       <0.00> EO-3: setting timed terminal event at 587.5
2025-05-09 15:45:46,126 sats.satellite.EO-1            INFO       <160.00> EO-1: imaged Target(tgt-869)
2025-05-09 15:45:46,129 data.base                      INFO       <160.00> Total reward: {'EO-1': 0.699626753016981}
2025-05-09 15:45:46,132 sats.satellite.EO-1            INFO       <160.00> EO-1: Satellite EO-1 requires retasking
2025-05-09 15:45:46,133 sats.satellite.EO-1            INFO       <160.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-05-09 15:45:46,170 sats.satellite.EO-2            INFO       <160.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-05-09 15:45:46,216 gym                            INFO       <160.00> Step reward: {'EO-1': 0.699626753016981}
[14]:
observation
[14]:
{'EO-1': array([ 0.03940157, -0.022577  ,  0.4074249 , -0.01184677,  0.31078485,
        -0.00511768,  0.15444453,  0.0034657 ,  0.15493098,  0.0334931 ,
         0.25327144,  0.0600931 ,  0.8659869 ,  0.09386789,  0.45798376,
         0.10036904,  0.65706224,  0.1146261 ,  0.31218336,  0.10672817]),
 'EO-2': array([ 0.18259187, -0.01897202,  0.55942136,  0.00210687,  0.61906133,
         0.0108184 ,  0.04700211, -0.00444765,  0.64235501,  0.02428445,
         0.752648  ,  0.05291728,  0.12972053,  0.05951714,  0.46985677,
         0.0523609 ,  0.53430127,  0.06216361,  0.95453296,  0.08007677]),
 'EO-3': array([ 0.99338866,  0.00993309,  0.76460356, -0.0118841 ,  0.07490146,
         0.00854242,  0.37868782,  0.02586204,  0.76646804,  0.01625605,
         0.70243958,  0.03444775,  0.13864359,  0.02896972,  0.5842263 ,
         0.04342319,  0.03858758,  0.03870965,  0.46334271,  0.06231323])}

Other than compatibility with MARL algorithms, the main benefit of the PettingZoo API is that it allows for individual agents to fail without terminating the entire environment.

[15]:
# Immediately kill satellite 0
env.unwrapped.satellites[0].is_alive = isnt_alive
env.agents
[15]:
['EO-1', 'EO-2', 'EO-3']
[16]:
observation, reward, terminated, truncated, info = env.step({
        env.agents[0]: 7,
        env.agents[1]: 9,
    }
)
2025-05-09 15:45:46,233 gym                            INFO       <160.00> === STARTING STEP ===
2025-05-09 15:45:46,234 sats.satellite.EO-1            INFO       <160.00> EO-1: target index 7 tasked
2025-05-09 15:45:46,234 sats.satellite.EO-1            INFO       <160.00> EO-1: Target(tgt-819) tasked for imaging
2025-05-09 15:45:46,236 sats.satellite.EO-1            INFO       <160.00> EO-1: Target(tgt-819) window enabled: 732.1 to 937.2
2025-05-09 15:45:46,236 sats.satellite.EO-1            INFO       <160.00> EO-1: setting timed terminal event at 937.2
2025-05-09 15:45:46,237 sats.satellite.EO-2            INFO       <160.00> EO-2: target index 9 tasked
2025-05-09 15:45:46,238 sats.satellite.EO-2            INFO       <160.00> EO-2: Target(tgt-55) tasked for imaging
2025-05-09 15:45:46,239 sats.satellite.EO-2            INFO       <160.00> EO-2: Target(tgt-55) window enabled: 616.4 to 730.4
2025-05-09 15:45:46,239 sats.satellite.EO-2            INFO       <160.00> EO-2: setting timed terminal event at 730.4
2025-05-09 15:45:46,372 sats.satellite.EO-3            INFO       <383.00> EO-3: imaged Target(tgt-893)
2025-05-09 15:45:46,375 data.base                      INFO       <383.00> Total reward: {'EO-3': 0.03858757678013269}
2025-05-09 15:45:46,378 sats.satellite.EO-3            INFO       <383.00> EO-3: Satellite EO-3 requires retasking
2025-05-09 15:45:46,380 sats.satellite.EO-3            INFO       <383.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-05-09 15:45:46,424 gym                            INFO       <383.00> Step reward: {'EO-1': -1.0, 'EO-3': 0.03858757678013269}
2025-05-09 15:45:46,425 gym                            INFO       <383.00> Episode terminated: ['EO-1']