Multi-Agent Environments

Two multiagent environments are given in the package:

The latter is preferable for multi-agent RL (MARL) settings, as most algorithms are designed for this kind of API.

Configuring the Environment

For this example, a multisatellite target imaging environment will be used. The goal is to maximize the value of unique images taken.

As usual, the satellite type is defined first.

[1]:
from bsk_rl import sats, act, obs, scene, data, comm
from bsk_rl.sim import dyn, fsw

class ImagingSatellite(sats.ImagingSatellite):
    observation_spec = [
        obs.OpportunityProperties(
            dict(prop="priority"),
            dict(prop="opportunity_open", norm=5700.0),
            n_ahead_observe=10,
        )
    ]
    action_spec = [act.Image(n_ahead_image=10)]
    dyn_type = dyn.FullFeaturedDynModel
    fsw_type = fsw.SteeringImagerFSWModel

Satellite properties are set to give the satellite near-unlimited power and storage resources. To randomize some parameters in a correlated manner across satellites, a sat_arg_randomizer is set and passed to the environment. In this case, the satellites are distributed in a trivial single-plane Walker-delta constellation.

[2]:

from bsk_rl.utils.orbital import walker_delta_args sat_args = dict( imageAttErrorRequirement=0.01, imageRateErrorRequirement=0.01, batteryStorageCapacity=1e9, storedCharge_Init=1e9, dataStorageCapacity=1e12, u_max=0.4, K1=0.25, K3=3.0, omega_max=0.087, servo_Ki=5.0, servo_P=150 / 5, ) sat_arg_randomizer = walker_delta_args(altitude=800.0, inc=60.0, n_planes=1)

Gym API

GeneralSatelliteTasking uses tuples of actions and observations to interact with the environment.

[3]:
from bsk_rl import GeneralSatelliteTasking

env = GeneralSatelliteTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_space
2024-09-12 15:06:03,760 gym                            INFO       Resetting environment with seed=1879338696
2024-09-12 15:06:03,761 scene.targets                  INFO       Generating 1000 targets
2024-09-12 15:06:03,920 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2024-09-12 15:06:03,943 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2024-09-12 15:06:03,964 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2024-09-12 15:06:03,982 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2024-09-12 15:06:04,000 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2024-09-12 15:06:04,020 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2024-09-12 15:06:04,043 gym                            INFO       <0.00> Environment reset
[3]:
Tuple(Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64))
[4]:
env.action_space
[4]:
Tuple(Discrete(10), Discrete(10), Discrete(10))

Consequently, actions are passed as a tuple. The step will stop the first time any satellite completes an action.

[5]:
observation, reward, terminated, truncated, info = env.step([7, 9, 8])
2024-09-12 15:06:04,053 gym                            INFO       <0.00> === STARTING STEP ===
2024-09-12 15:06:04,053 sats.satellite.EO-1            INFO       <0.00> EO-1: target index 7 tasked
2024-09-12 15:06:04,053 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-195) tasked for imaging
2024-09-12 15:06:04,054 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-195) window enabled: 462.8 to 548.3
2024-09-12 15:06:04,054 sats.satellite.EO-1            INFO       <0.00> EO-1: setting timed terminal event at 548.3
2024-09-12 15:06:04,055 sats.satellite.EO-2            INFO       <0.00> EO-2: target index 9 tasked
2024-09-12 15:06:04,055 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-680) tasked for imaging
2024-09-12 15:06:04,056 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-680) window enabled: 646.1 to 833.3
2024-09-12 15:06:04,056 sats.satellite.EO-2            INFO       <0.00> EO-2: setting timed terminal event at 833.3
2024-09-12 15:06:04,056 sats.satellite.EO-3            INFO       <0.00> EO-3: target index 8 tasked
2024-09-12 15:06:04,057 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-211) tasked for imaging
2024-09-12 15:06:04,057 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-211) window enabled: 986.2 to 1077.6
2024-09-12 15:06:04,057 sats.satellite.EO-3            INFO       <0.00> EO-3: setting timed terminal event at 1077.6
2024-09-12 15:06:04,148 sats.satellite.EO-1            INFO       <465.00> EO-1: imaged Target(tgt-195)
2024-09-12 15:06:04,152 data.base                      INFO       <465.00> Data reward: {'EO-1': 0.8230500147347414, 'EO-2': 0.0, 'EO-3': 0.0}
2024-09-12 15:06:04,156 sats.satellite.EO-1            INFO       <465.00> EO-1: Satellite EO-1 requires retasking
2024-09-12 15:06:04,156 sats.satellite.EO-2            INFO       <465.00> EO-2: Finding opportunity windows from 1200.00 to 1800.00 seconds
2024-09-12 15:06:04,182 gym                            INFO       <465.00> Step reward: 0.8230500147347414
[6]:
observation
[6]:
(array([ 0.21602972, -0.00403203,  0.28284637,  0.00356078,  0.4518415 ,
         0.04613867,  0.47992008,  0.05650256,  0.64224289,  0.06190224,
         0.74877742,  0.07368187,  0.59800648,  0.05895398,  0.30639417,
         0.06799013,  0.08390608,  0.0861771 ,  0.47193369,  0.10286262]),
 array([0.88676028, 0.00289549, 0.26382734, 0.02413358, 0.33591011,
        0.03177945, 0.7800974 , 0.05823664, 0.76553764, 0.05932064,
        0.48517132, 0.09174613, 0.49501144, 0.11767415, 0.60871352,
        0.16188445, 0.53565483, 0.15520731, 0.7778392 , 0.15127552]),
 array([1.92145237e-01, 7.09354487e-04, 7.69811097e-02, 9.43758064e-03,
        8.96589817e-01, 4.59254776e-03, 9.05020147e-01, 2.69390689e-02,
        9.13063309e-01, 9.14312097e-02, 9.30135351e-01, 8.71906659e-02,
        1.55938107e-01, 7.51445464e-02, 2.97161675e-01, 8.92334327e-02,
        8.99333303e-01, 1.10183857e-01, 1.88123050e-01, 1.27600018e-01]))

At this point, either every satellite can be retasked, or satellites can continue their previous action by passing None as the action. To see which satellites must be retasked (i.e. their previous action is done and they have nothing more to do), look at "requires_retasking" in each satellite’s info.

[7]:
info
[7]:
{'EO-1': {'requires_retasking': True},
 'EO-2': {'requires_retasking': False},
 'EO-3': {'requires_retasking': False},
 'd_ts': 465.00000000000006}

Based on this list, we decide here to only retask the satellite that needs it.

[8]:
actions = [0 if info[sat.name]["requires_retasking"] else None for sat in env.unwrapped.satellites]
actions
[8]:
[0, None, None]
[9]:
observation, reward, terminated, truncated, info = env.step(actions)
2024-09-12 15:06:04,195 gym                            INFO       <465.00> === STARTING STEP ===
2024-09-12 15:06:04,195 sats.satellite.EO-1            INFO       <465.00> EO-1: target index 0 tasked
2024-09-12 15:06:04,196 sats.satellite.EO-1            INFO       <465.00> EO-1: Target(tgt-457) tasked for imaging
2024-09-12 15:06:04,196 sats.satellite.EO-1            INFO       <465.00> EO-1: Target(tgt-457) window enabled: 442.0 to 532.9
2024-09-12 15:06:04,197 sats.satellite.EO-1            INFO       <465.00> EO-1: setting timed terminal event at 532.9
2024-09-12 15:06:04,210 sats.satellite.EO-1            INFO       <529.00> EO-1: imaged Target(tgt-457)
2024-09-12 15:06:04,212 data.base                      INFO       <529.00> Data reward: {'EO-1': 0.21602971982103714, 'EO-2': 0.0, 'EO-3': 0.0}
2024-09-12 15:06:04,213 sats.satellite.EO-1            INFO       <529.00> EO-1: Satellite EO-1 requires retasking
2024-09-12 15:06:04,214 gym                            INFO       <529.00> Step reward: 0.21602971982103714

In this environment, the environment will stop if any agent dies. To demonstrate this, one satellite is forcibly killed.

[10]:
from Basilisk.architecture import messaging

def isnt_alive(log_failure=False):
    """Mock satellite 0 dying."""
    self = env.unwrapped.satellites[0]
    death_message = messaging.PowerStorageStatusMsgPayload()
    death_message.storageLevel = 0.0
    self.dynamics.powerMonitor.batPowerOutMsg.write(death_message)
    return self.dynamics.is_alive(log_failure=log_failure) and self.fsw.is_alive(
        log_failure=log_failure
    )

env.unwrapped.satellites[0].is_alive = isnt_alive
observation, reward, terminated, truncated, info = env.step([6, 7, 9])

2024-09-12 15:06:04,218 gym                            INFO       <529.00> === STARTING STEP ===
2024-09-12 15:06:04,219 sats.satellite.EO-1            INFO       <529.00> EO-1: target index 6 tasked
2024-09-12 15:06:04,219 sats.satellite.EO-1            INFO       <529.00> EO-1: Target(tgt-507) tasked for imaging
2024-09-12 15:06:04,220 sats.satellite.EO-1            INFO       <529.00> EO-1: Target(tgt-507) window enabled: 852.5 to 1057.2
2024-09-12 15:06:04,220 sats.satellite.EO-1            INFO       <529.00> EO-1: setting timed terminal event at 1057.2
2024-09-12 15:06:04,221 sats.satellite.EO-2            INFO       <529.00> EO-2: target index 7 tasked
2024-09-12 15:06:04,221 sats.satellite.EO-2            INFO       <529.00> EO-2: Target(tgt-353) tasked for imaging
2024-09-12 15:06:04,222 sats.satellite.EO-2            INFO       <529.00> EO-2: Target(tgt-353) window enabled: 1349.7 to 1528.5
2024-09-12 15:06:04,222 sats.satellite.EO-2            INFO       <529.00> EO-2: setting timed terminal event at 1528.5
2024-09-12 15:06:04,222 sats.satellite.EO-3            INFO       <529.00> EO-3: target index 9 tasked
2024-09-12 15:06:04,223 sats.satellite.EO-3            INFO       <529.00> EO-3: Target(tgt-261) tasked for imaging
2024-09-12 15:06:04,223 sats.satellite.EO-3            INFO       <529.00> EO-3: Target(tgt-261) window enabled: 1192.3 to 1200.0
2024-09-12 15:06:04,223 sats.satellite.EO-3            INFO       <529.00> EO-3: setting timed terminal event at 1200.0
2024-09-12 15:06:04,285 sats.satellite.EO-1            INFO       <855.00> EO-1: imaged Target(tgt-507)
2024-09-12 15:06:04,287 data.base                      INFO       <855.00> Data reward: {'EO-1': 0.30639416945804554, 'EO-2': 0.0, 'EO-3': 0.0}
2024-09-12 15:06:04,290 sats.satellite.EO-1            INFO       <855.00> EO-1: Satellite EO-1 requires retasking
2024-09-12 15:06:04,291 sats.satellite.EO-1            INFO       <855.00> EO-1: Finding opportunity windows from 1200.00 to 1800.00 seconds
2024-09-12 15:06:04,312 sats.satellite.EO-1            WARNING    <855.00> EO-1: failed battery_valid check
2024-09-12 15:06:04,312 gym                            INFO       <855.00> Step reward: -0.6936058305419545
2024-09-12 15:06:04,312 gym                            INFO       <855.00> Episode terminated: True
2024-09-12 15:06:04,312 gym                            INFO       <855.00> Episode truncated: False

PettingZoo API

The PettingZoo parallel API environment, ConstellationTasking, is largely the same as GeneralSatelliteTasking. See their documentation for a full description of the API. It tends to separate things into dictionaries keyed by agent, rather than tuples.

[11]:
from bsk_rl import ConstellationTasking

env = ConstellationTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_spaces
2024-09-12 15:06:04,318                                WARNING    Creating logger for new env on PID=96022. Old environments in process may now log times incorrectly.
2024-09-12 15:06:04,501 gym                            INFO       Resetting environment with seed=2620988696
2024-09-12 15:06:04,502 scene.targets                  INFO       Generating 1000 targets
2024-09-12 15:06:04,653 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2024-09-12 15:06:04,674 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2024-09-12 15:06:04,699 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2024-09-12 15:06:04,725 gym                            INFO       <0.00> Environment reset
[11]:
{'EO-1': Box(-1e+16, 1e+16, (20,), float64),
 'EO-2': Box(-1e+16, 1e+16, (20,), float64),
 'EO-3': Box(-1e+16, 1e+16, (20,), float64)}
[12]:
env.action_spaces
[12]:
{'EO-1': Discrete(10), 'EO-2': Discrete(10), 'EO-3': Discrete(10)}

Actions are passed as a dictionary; the agent names can be accessed through the agents property.

[13]:
observation, reward, terminated, truncated, info = env.step(
    {
        env.agents[0]: 7,
        env.agents[1]: 9,
        env.agents[2]: 8,
    }
)
2024-09-12 15:06:04,734 gym                            INFO       <0.00> === STARTING STEP ===
2024-09-12 15:06:04,734 sats.satellite.EO-1            INFO       <0.00> EO-1: target index 7 tasked
2024-09-12 15:06:04,735 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-539) tasked for imaging
2024-09-12 15:06:04,735 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-539) window enabled: 288.7 to 440.8
2024-09-12 15:06:04,736 sats.satellite.EO-1            INFO       <0.00> EO-1: setting timed terminal event at 440.8
2024-09-12 15:06:04,736 sats.satellite.EO-2            INFO       <0.00> EO-2: target index 9 tasked
2024-09-12 15:06:04,736 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-937) tasked for imaging
2024-09-12 15:06:04,737 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-937) window enabled: 357.5 to 473.6
2024-09-12 15:06:04,737 sats.satellite.EO-2            INFO       <0.00> EO-2: setting timed terminal event at 473.6
2024-09-12 15:06:04,737 sats.satellite.EO-3            INFO       <0.00> EO-3: target index 8 tasked
2024-09-12 15:06:04,738 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-926) tasked for imaging
2024-09-12 15:06:04,738 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-926) window enabled: 353.2 to 451.1
2024-09-12 15:06:04,738 sats.satellite.EO-3            INFO       <0.00> EO-3: setting timed terminal event at 451.1
2024-09-12 15:06:04,794 sats.satellite.EO-1            INFO       <291.00> EO-1: imaged Target(tgt-539)
2024-09-12 15:06:04,796 data.base                      INFO       <291.00> Data reward: {'EO-1': 0.662026254515778, 'EO-2': 0.0, 'EO-3': 0.0}
2024-09-12 15:06:04,799 sats.satellite.EO-1            INFO       <291.00> EO-1: Satellite EO-1 requires retasking
2024-09-12 15:06:04,800 sats.satellite.EO-1            INFO       <291.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2024-09-12 15:06:04,824 sats.satellite.EO-3            INFO       <291.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2024-09-12 15:06:04,850 gym                            INFO       <291.00> Step reward: {'EO-1': 0.662026254515778, 'EO-2': 0.0, 'EO-3': 0.0}
2024-09-12 15:06:04,850 gym                            INFO       <291.00> Episode terminated: {'EO-1': False, 'EO-2': False, 'EO-3': False}
2024-09-12 15:06:04,850 gym                            INFO       <291.00> Episode truncated: {'EO-1': False, 'EO-2': False, 'EO-3': False}
[14]:
observation
[14]:
{'EO-1': array([ 3.54079768e-01,  1.43722174e-04,  3.41016950e-01, -1.19984827e-02,
         4.59862409e-01, -3.73977166e-04,  1.57034165e-01,  4.09939092e-02,
         4.77186531e-01,  1.85235817e-02,  7.89599807e-01,  5.51169539e-02,
         4.91062428e-01,  6.76157713e-02,  4.91219311e-01,  5.56755562e-02,
         6.77410361e-01,  7.07679283e-02,  2.94042355e-01,  1.12199602e-01]),
 'EO-2': array([ 9.14702578e-01, -2.65200788e-02,  4.95920248e-01, -2.44466556e-02,
         3.61190392e-01, -5.08222404e-03,  7.70723330e-01, -2.27018729e-02,
         3.24277884e-01,  5.20187817e-04,  6.73082171e-01,  1.16583843e-02,
         9.45554102e-01,  1.19332716e-02,  3.68596535e-01,  5.59413877e-03,
         2.98544255e-01,  3.60677170e-02,  3.48486992e-01,  3.10203217e-02]),
 'EO-3': array([ 0.52332492, -0.01215394,  0.98737445,  0.01091435,  0.4451887 ,
         0.00316024,  0.0034789 ,  0.0223582 ,  0.76490943,  0.01442445,
         0.51863226,  0.02706901,  0.92739003,  0.0306299 ,  0.83661002,
         0.04446559,  0.73678664,  0.03750987,  0.85066234,  0.11141505])}

Other than compatibility with MARL algorithms, the main benefit of the PettingZoo API is that it allows for individual agents to fail without terminating the entire environment.

[15]:
# Immediately kill satellite 0
env.unwrapped.satellites[0].is_alive = isnt_alive
env.agents
[15]:
['EO-2', 'EO-3']
[16]:
observation, reward, terminated, truncated, info = env.step({
        env.agents[0]: 7,
        env.agents[1]: 9,
    }
)
2024-09-12 15:06:04,861 gym                            INFO       <291.00> === STARTING STEP ===
2024-09-12 15:06:04,862 sats.satellite.EO-2            INFO       <291.00> EO-2: target index 7 tasked
2024-09-12 15:06:04,862 sats.satellite.EO-2            INFO       <291.00> EO-2: Target(tgt-835) tasked for imaging
2024-09-12 15:06:04,863 sats.satellite.EO-2            INFO       <291.00> EO-2: Target(tgt-835) window enabled: 322.9 to 532.5
2024-09-12 15:06:04,863 sats.satellite.EO-2            INFO       <291.00> EO-2: setting timed terminal event at 532.5
2024-09-12 15:06:04,863 sats.satellite.EO-3            INFO       <291.00> EO-3: target index 9 tasked
2024-09-12 15:06:04,863 sats.satellite.EO-3            INFO       <291.00> EO-3: Target(tgt-686) tasked for imaging
2024-09-12 15:06:04,864 sats.satellite.EO-3            INFO       <291.00> EO-3: Target(tgt-686) window enabled: 926.1 to 1098.0
2024-09-12 15:06:04,864 sats.satellite.EO-3            INFO       <291.00> EO-3: setting timed terminal event at 1098.0
2024-09-12 15:06:04,873 sats.satellite.EO-2            INFO       <332.00> EO-2: imaged Target(tgt-835)
2024-09-12 15:06:04,875 data.base                      INFO       <332.00> Data reward: {'EO-1': 0.0, 'EO-2': 0.36859653472551324, 'EO-3': 0.0}
2024-09-12 15:06:04,876 sats.satellite.EO-2            INFO       <332.00> EO-2: Satellite EO-2 requires retasking
2024-09-12 15:06:04,881 gym                            INFO       <332.00> Step reward: {'EO-2': 0.36859653472551324, 'EO-3': 0.0}
2024-09-12 15:06:04,881 gym                            INFO       <332.00> Episode terminated: {'EO-2': False, 'EO-3': False}
2024-09-12 15:06:04,881 gym                            INFO       <332.00> Episode truncated: {'EO-2': False, 'EO-3': False}