Multi-Agent Environments

Two multiagent environments are given in the package:

The latter is preferable for multi-agent RL (MARL) settings, as most algorithms are designed for this kind of API.

Configuring the Environment

For this example, a multisatellite target imaging environment will be used. The goal is to maximize the value of unique images taken.

As usual, the satellite type is defined first.

[1]:
from bsk_rl import sats, act, obs, scene, data, comm
from bsk_rl.sim import dyn, fsw

class ImagingSatellite(sats.ImagingSatellite):
    observation_spec = [
        obs.OpportunityProperties(
            dict(prop="priority"),
            dict(prop="opportunity_open", norm=5700.0),
            n_ahead_observe=10,
        )
    ]
    action_spec = [act.Image(n_ahead_image=10)]
    dyn_type = dyn.FullFeaturedDynModel
    fsw_type = fsw.SteeringImagerFSWModel

Satellite properties are set to give the satellite near-unlimited power and storage resources. To randomize some parameters in a correlated manner across satellites, a sat_arg_randomizer is set and passed to the environment. In this case, the satellites are distributed in a trivial single-plane Walker-delta constellation.

[2]:

from bsk_rl.utils.orbital import walker_delta_args sat_args = dict( imageAttErrorRequirement=0.01, imageRateErrorRequirement=0.01, batteryStorageCapacity=1e9, storedCharge_Init=1e9, dataStorageCapacity=1e12, u_max=0.4, K1=0.25, K3=3.0, omega_max=0.087, servo_Ki=5.0, servo_P=150 / 5, ) sat_arg_randomizer = walker_delta_args(altitude=800.0, inc=60.0, n_planes=1)

Gym API

GeneralSatelliteTasking uses tuples of actions and observations to interact with the environment.

[3]:
from bsk_rl import GeneralSatelliteTasking

env = GeneralSatelliteTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_space
2025-09-30 17:48:20,446 gym                            INFO       Resetting environment with seed=2963567368
2025-09-30 17:48:20,449 scene.targets                  INFO       Generating 1000 targets
2025-09-30 17:48:20,625 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2025-09-30 17:48:20,668 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2025-09-30 17:48:20,709 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2025-09-30 17:48:20,747 gym                            INFO       <0.00> Environment reset
[3]:
Tuple(Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64))
[4]:
env.action_space
[4]:
Tuple(Discrete(10), Discrete(10), Discrete(10))

Consequently, actions are passed as a tuple. The step will stop the first time any satellite completes an action.

[5]:
observation, reward, terminated, truncated, info = env.step([7, 9, 8])
2025-09-30 17:48:20,760 gym                            INFO       <0.00> === STARTING STEP ===
2025-09-30 17:48:20,761 sats.satellite.EO-1            INFO       <0.00> EO-1: target index 7 tasked
2025-09-30 17:48:20,761 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-489) tasked for imaging
2025-09-30 17:48:20,763 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-489) window enabled: 138.9 to 278.7
2025-09-30 17:48:20,764 sats.satellite.EO-1            INFO       <0.00> EO-1: setting timed terminal event at 278.7
2025-09-30 17:48:20,765 sats.satellite.EO-2            INFO       <0.00> EO-2: target index 9 tasked
2025-09-30 17:48:20,766 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-585) tasked for imaging
2025-09-30 17:48:20,767 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-585) window enabled: 218.8 to 347.2
2025-09-30 17:48:20,768 sats.satellite.EO-2            INFO       <0.00> EO-2: setting timed terminal event at 347.2
2025-09-30 17:48:20,769 sats.satellite.EO-3            INFO       <0.00> EO-3: target index 8 tasked
2025-09-30 17:48:20,859 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-40) tasked for imaging
2025-09-30 17:48:20,861 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-40) window enabled: 375.7 to 476.7
2025-09-30 17:48:20,862 sats.satellite.EO-3            INFO       <0.00> EO-3: setting timed terminal event at 476.7
2025-09-30 17:48:20,951 sats.satellite.EO-1            INFO       <141.00> EO-1: imaged Target(tgt-489)
2025-09-30 17:48:20,955 data.base                      INFO       <141.00> Total reward: {'EO-1': 0.06625522936015571}
2025-09-30 17:48:20,956 sats.satellite.EO-1            INFO       <141.00> EO-1: Satellite EO-1 requires retasking
2025-09-30 17:48:20,959 gym                            INFO       <141.00> Step reward: 0.06625522936015571
[6]:
observation
[6]:
(array([ 5.96160627e-01, -2.47368421e-02,  7.68173258e-01, -2.28006291e-02,
         3.67870132e-01,  7.67142860e-03,  5.44535719e-01, -1.19943798e-02,
         7.11265941e-01, -1.17838606e-02,  2.29846142e-01,  2.32341688e-02,
         5.41148380e-01, -3.78573322e-04,  9.43570707e-02,  3.94154573e-03,
         7.34632261e-01,  1.93755082e-02,  5.00811913e-01,  2.92735093e-02]),
 array([ 0.9798527 , -0.02473684,  0.43253013, -0.01275244,  0.41522719,
        -0.02473684,  0.80125187, -0.00875428,  0.06804937, -0.01714867,
         0.79330325,  0.0075926 ,  0.2229885 ,  0.00602685,  0.47947086,
         0.01364086,  0.11559405,  0.02232233,  0.32041838,  0.02046235]),
 array([ 0.7676276 , -0.02473684,  0.10737763, -0.02473684,  0.66024491,
        -0.00523304,  0.01709735,  0.00920612,  0.85628165, -0.00269747,
         0.77822536,  0.00305593,  0.63997663,  0.01511992,  0.25257213,
         0.04118007,  0.17017131,  0.05102272,  0.92872856,  0.07899817]))

At this point, either every satellite can be retasked, or satellites can continue their previous action by passing None as the action. To see which satellites must be retasked (i.e. their previous action is done and they have nothing more to do), look at "requires_retasking" in each satellite’s info.

[7]:
info
[7]:
{'EO-1': {'requires_retasking': True},
 'EO-2': {'requires_retasking': False},
 'EO-3': {'requires_retasking': False},
 'd_ts': 141.0}

Based on this list, we decide here to only retask the satellite that needs it.

[8]:
actions = [0 if info[sat.name]["requires_retasking"] else None for sat in env.unwrapped.satellites]
actions
[8]:
[0, None, None]
[9]:
observation, reward, terminated, truncated, info = env.step(actions)
2025-09-30 17:48:20,980 gym                            INFO       <141.00> === STARTING STEP ===
2025-09-30 17:48:20,980 sats.satellite.EO-1            INFO       <141.00> EO-1: target index 0 tasked
2025-09-30 17:48:20,981 sats.satellite.EO-1            INFO       <141.00> EO-1: Target(tgt-123) tasked for imaging
2025-09-30 17:48:20,983 sats.satellite.EO-1            INFO       <141.00> EO-1: Target(tgt-123) window enabled: 0.0 to 160.8
2025-09-30 17:48:20,983 sats.satellite.EO-1            INFO       <141.00> EO-1: setting timed terminal event at 160.8
2025-09-30 17:48:20,997 sats.satellite.EO-1            INFO       <161.00> EO-1: timed termination at 160.8 for Target(tgt-123) window
2025-09-30 17:48:21,001 data.base                      INFO       <161.00> Total reward: {}
2025-09-30 17:48:21,002 sats.satellite.EO-1            INFO       <161.00> EO-1: Satellite EO-1 requires retasking
2025-09-30 17:48:21,004 gym                            INFO       <161.00> Step reward: 0.0

In this environment, the environment will stop if any agent dies. To demonstrate this, one satellite is forcibly killed.

[10]:
from Basilisk.architecture import messaging

def isnt_alive(log_failure=False):
    """Mock satellite 0 dying."""
    self = env.unwrapped.satellites[0]
    death_message = messaging.PowerStorageStatusMsgPayload()
    death_message.storageLevel = 0.0
    self.dynamics.powerMonitor.batPowerOutMsg.write(death_message)
    return self.dynamics.is_alive(log_failure=log_failure) and self.fsw.is_alive(
        log_failure=log_failure
    )

env.unwrapped.satellites[0].is_alive = isnt_alive
observation, reward, terminated, truncated, info = env.step([6, 7, 9])

2025-09-30 17:48:21,010 gym                            INFO       <161.00> === STARTING STEP ===
2025-09-30 17:48:21,011 sats.satellite.EO-1            INFO       <161.00> EO-1: target index 6 tasked
2025-09-30 17:48:21,012 sats.satellite.EO-1            INFO       <161.00> EO-1: Target(tgt-836) tasked for imaging
2025-09-30 17:48:21,013 sats.satellite.EO-1            INFO       <161.00> EO-1: Target(tgt-836) window enabled: 163.5 to 365.9
2025-09-30 17:48:21,014 sats.satellite.EO-1            INFO       <161.00> EO-1: setting timed terminal event at 365.9
2025-09-30 17:48:21,014 sats.satellite.EO-2            INFO       <161.00> EO-2: target index 7 tasked
2025-09-30 17:48:21,016 sats.satellite.EO-2            INFO       <161.00> EO-2: Target(tgt-585) window enabled: 218.8 to 347.2
2025-09-30 17:48:21,016 sats.satellite.EO-2            INFO       <161.00> EO-2: setting timed terminal event at 347.2
2025-09-30 17:48:21,017 sats.satellite.EO-3            INFO       <161.00> EO-3: target index 9 tasked
2025-09-30 17:48:21,018 sats.satellite.EO-3            INFO       <161.00> EO-3: Target(tgt-684) tasked for imaging
2025-09-30 17:48:21,020 sats.satellite.EO-3            INFO       <161.00> EO-3: Target(tgt-684) window enabled: 495.8 to 600.0
2025-09-30 17:48:21,020 sats.satellite.EO-3            INFO       <161.00> EO-3: setting timed terminal event at 600.0
2025-09-30 17:48:21,058 sats.satellite.EO-1            INFO       <221.00> EO-1: imaged Target(tgt-836)
2025-09-30 17:48:21,059 sats.satellite.EO-2            INFO       <221.00> EO-2: imaged Target(tgt-585)
2025-09-30 17:48:21,063 data.base                      INFO       <221.00> Total reward: {'EO-1': 0.09435707073145894, 'EO-2': 0.47947086178516385}
2025-09-30 17:48:21,063 sats.satellite.EO-1            INFO       <221.00> EO-1: Satellite EO-1 requires retasking
2025-09-30 17:48:21,064 sats.satellite.EO-2            INFO       <221.00> EO-2: Satellite EO-2 requires retasking
2025-09-30 17:48:21,066 sats.satellite.EO-1            WARNING    <221.00> EO-1: failed battery_valid check
2025-09-30 17:48:21,067 gym                            INFO       <221.00> Step reward: -0.4261720674833772
2025-09-30 17:48:21,068 gym                            INFO       <221.00> Episode terminated: True
2025-09-30 17:48:21,068 gym                            INFO       <221.00> Episode truncated: False

PettingZoo API

The PettingZoo parallel API environment, ConstellationTasking, is largely the same as GeneralSatelliteTasking. See their documentation for a full description of the API. It tends to separate things into dictionaries keyed by agent, rather than tuples.

[11]:
from bsk_rl import ConstellationTasking

env = ConstellationTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_spaces
2025-09-30 17:48:21,076                                WARNING    Creating logger for new env on PID=4762. Old environments in process may now log times incorrectly.
2025-09-30 17:48:21,192 gym                            INFO       Resetting environment with seed=3709250730
2025-09-30 17:48:21,194 scene.targets                  INFO       Generating 1000 targets
2025-09-30 17:48:21,359 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2025-09-30 17:48:21,396 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2025-09-30 17:48:21,432 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2025-09-30 17:48:21,467 gym                            INFO       <0.00> Environment reset
[11]:
{'EO-1': Box(-1e+16, 1e+16, (20,), float64),
 'EO-2': Box(-1e+16, 1e+16, (20,), float64),
 'EO-3': Box(-1e+16, 1e+16, (20,), float64)}
[12]:
env.action_spaces
[12]:
{'EO-1': Discrete(10), 'EO-2': Discrete(10), 'EO-3': Discrete(10)}

Actions are passed as a dictionary; the agent names can be accessed through the agents property.

[13]:
observation, reward, terminated, truncated, info = env.step(
    {
        env.agents[0]: 7,
        env.agents[1]: 9,
        env.agents[2]: 8,
    }
)
2025-09-30 17:48:21,480 gym                            INFO       <0.00> === STARTING STEP ===
2025-09-30 17:48:21,481 sats.satellite.EO-1            INFO       <0.00> EO-1: target index 7 tasked
2025-09-30 17:48:21,481 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-381) tasked for imaging
2025-09-30 17:48:21,483 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-381) window enabled: 201.4 to 367.7
2025-09-30 17:48:21,484 sats.satellite.EO-1            INFO       <0.00> EO-1: setting timed terminal event at 367.7
2025-09-30 17:48:21,485 sats.satellite.EO-2            INFO       <0.00> EO-2: target index 9 tasked
2025-09-30 17:48:21,485 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-671) tasked for imaging
2025-09-30 17:48:21,487 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-671) window enabled: 563.1 to 600.0
2025-09-30 17:48:21,487 sats.satellite.EO-2            INFO       <0.00> EO-2: setting timed terminal event at 600.0
2025-09-30 17:48:21,488 sats.satellite.EO-3            INFO       <0.00> EO-3: target index 8 tasked
2025-09-30 17:48:21,489 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-577) tasked for imaging
2025-09-30 17:48:21,490 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-577) window enabled: 192.2 to 399.6
2025-09-30 17:48:21,490 sats.satellite.EO-3            INFO       <0.00> EO-3: setting timed terminal event at 399.6
2025-09-30 17:48:21,614 sats.satellite.EO-3            INFO       <195.00> EO-3: imaged Target(tgt-577)
2025-09-30 17:48:21,618 data.base                      INFO       <195.00> Total reward: {'EO-3': 0.15326445221937124}
2025-09-30 17:48:21,619 sats.satellite.EO-3            INFO       <195.00> EO-3: Satellite EO-3 requires retasking
2025-09-30 17:48:21,621 sats.satellite.EO-1            INFO       <195.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-09-30 17:48:21,680 sats.satellite.EO-2            INFO       <195.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-09-30 17:48:21,722 sats.satellite.EO-3            INFO       <195.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-09-30 17:48:21,761 gym                            INFO       <195.00> Step reward: {'EO-3': 0.15326445221937124}
[14]:
observation
[14]:
{'EO-1': array([0.56967147, 0.0071239 , 0.33920383, 0.00500981, 0.36917193,
        0.00184067, 0.61642625, 0.00112843, 0.24763518, 0.0023755 ,
        0.14780834, 0.00856776, 0.04707673, 0.04331589, 0.2797925 ,
        0.06802278, 0.4617153 , 0.10395737, 0.91777235, 0.08596873]),
 'EO-2': array([ 0.10394463, -0.02329034,  0.15342063, -0.02437549,  0.75589251,
        -0.01757136,  0.96356998, -0.00642668,  0.25522149,  0.02767246,
         0.63715122,  0.06457167,  0.40714651,  0.04128555,  0.58133586,
         0.09266912,  0.92512736,  0.09399403,  0.38789411,  0.10162427]),
 'EO-3': array([ 0.51977146, -0.02958271,  0.00948394, -0.03377893,  0.14562524,
        -0.00736226,  0.37265606, -0.00821826,  0.10008347,  0.01669924,
         0.97989127,  0.00389112,  0.86135334,  0.06844423,  0.2146969 ,
         0.05589427,  0.33503488,  0.05920801,  0.21180228,  0.11516294])}

Other than compatibility with MARL algorithms, the main benefit of the PettingZoo API is that it allows for individual agents to fail without terminating the entire environment.

[15]:
# Immediately kill satellite 0
env.unwrapped.satellites[0].is_alive = isnt_alive
env.agents
[15]:
['EO-1', 'EO-2', 'EO-3']
[16]:
observation, reward, terminated, truncated, info = env.step({
        env.agents[0]: 7,
        env.agents[1]: 9,
    }
)
2025-09-30 17:48:21,777 gym                            INFO       <195.00> === STARTING STEP ===
2025-09-30 17:48:21,778 sats.satellite.EO-1            INFO       <195.00> EO-1: target index 7 tasked
2025-09-30 17:48:21,778 sats.satellite.EO-1            INFO       <195.00> EO-1: Target(tgt-849) tasked for imaging
2025-09-30 17:48:21,780 sats.satellite.EO-1            INFO       <195.00> EO-1: Target(tgt-849) window enabled: 582.7 to 751.6
2025-09-30 17:48:21,780 sats.satellite.EO-1            INFO       <195.00> EO-1: setting timed terminal event at 751.6
2025-09-30 17:48:21,781 sats.satellite.EO-2            INFO       <195.00> EO-2: target index 9 tasked
2025-09-30 17:48:21,782 sats.satellite.EO-2            INFO       <195.00> EO-2: Target(tgt-733) tasked for imaging
2025-09-30 17:48:21,784 sats.satellite.EO-2            INFO       <195.00> EO-2: Target(tgt-733) window enabled: 774.3 to 940.7
2025-09-30 17:48:21,785 sats.satellite.EO-2            INFO       <195.00> EO-2: setting timed terminal event at 940.7
2025-09-30 17:48:21,786 sats.satellite.EO-3            WARNING    <195.00> EO-3: Requires retasking but received no task.
2025-09-30 17:48:21,909 sats.satellite.EO-3            INFO       <400.00> EO-3: timed termination at 399.6 for Target(tgt-577) window
2025-09-30 17:48:21,914 data.base                      INFO       <400.00> Total reward: {}
2025-09-30 17:48:21,914 sats.satellite.EO-3            INFO       <400.00> EO-3: Satellite EO-3 requires retasking
2025-09-30 17:48:21,918 gym                            INFO       <400.00> Step reward: {'EO-1': -1.0}
2025-09-30 17:48:21,918 gym                            INFO       <400.00> Episode terminated: ['EO-1']