Multi-Agent Environments

Two multiagent environments are given in the package:

GeneralSatelliteTasking, a Gymnasium-based environment and the basis for all other environments.
ConstellationTasking, which implements the PettingZoo parallel API.

The latter is preferable for multi-agent RL (MARL) settings, as most algorithms are designed for this kind of API.

Configuring the Environment

For this example, a multisatellite target imaging environment will be used. The goal is to maximize the value of unique images taken.

As usual, the satellite type is defined first.

[1]:

from bsk_rl import sats, act, obs, scene, data, comm
from bsk_rl.sim import dyn, fsw

class ImagingSatellite(sats.ImagingSatellite):
    observation_spec = [
        obs.OpportunityProperties(
            dict(prop="priority"),
            dict(prop="opportunity_open", norm=5700.0),
            n_ahead_observe=10,
        )
    ]
    action_spec = [act.Image(n_ahead_image=10)]
    dyn_type = dyn.FullFeaturedDynModel
    fsw_type = fsw.SteeringImagerFSWModel

Satellite properties are set to give the satellite near-unlimited power and storage resources. To randomize some parameters in a correlated manner across satellites, a sat_arg_randomizer is set and passed to the environment. In this case, the satellites are distributed in a trivial single-plane Walker-delta constellation.

[2]:

from bsk_rl.utils.orbital import walker_delta_args

sat_args = dict(
    imageAttErrorRequirement=0.01,
    imageRateErrorRequirement=0.01,
    batteryStorageCapacity=1e9,
    storedCharge_Init=1e9,
    dataStorageCapacity=1e12,
    u_max=0.4,
    K1=0.25,
    K3=3.0,
    omega_max=0.087,
    servo_Ki=5.0,
    servo_P=150 / 5,
)
sat_arg_randomizer = walker_delta_args(altitude=800.0, inc=60.0, n_planes=1)

Gym API

GeneralSatelliteTasking uses tuples of actions and observations to interact with the environment.

[3]:

from bsk_rl import GeneralSatelliteTasking

env = GeneralSatelliteTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_space

2025-07-24 00:11:28,763 gym                            INFO       Resetting environment with seed=3906111869
2025-07-24 00:11:28,766 scene.targets                  INFO       Generating 1000 targets
2025-07-24 00:11:28,943 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2025-07-24 00:11:28,991 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2025-07-24 00:11:29,029 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2025-07-24 00:11:29,069 gym                            INFO       <0.00> Environment reset

[3]:

Tuple(Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64))

[4]:

env.action_space

[4]:

Tuple(Discrete(10), Discrete(10), Discrete(10))

Consequently, actions are passed as a tuple. The step will stop the first time any satellite completes an action.

[5]:

observation, reward, terminated, truncated, info = env.step([7, 9, 8])

2025-07-24 00:11:29,084 gym                            INFO       <0.00> === STARTING STEP ===
2025-07-24 00:11:29,084 sats.satellite.EO-1            INFO       <0.00> EO-1: target index 7 tasked
2025-07-24 00:11:29,085 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-495) tasked for imaging
2025-07-24 00:11:29,087 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-495) window enabled: 351.2 to 519.2
2025-07-24 00:11:29,087 sats.satellite.EO-1            INFO       <0.00> EO-1: setting timed terminal event at 519.2
2025-07-24 00:11:29,088 sats.satellite.EO-2            INFO       <0.00> EO-2: target index 9 tasked
2025-07-24 00:11:29,089 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-567) tasked for imaging
2025-07-24 00:11:29,090 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-567) window enabled: 215.2 to 403.8
2025-07-24 00:11:29,090 sats.satellite.EO-2            INFO       <0.00> EO-2: setting timed terminal event at 403.8
2025-07-24 00:11:29,091 sats.satellite.EO-3            INFO       <0.00> EO-3: target index 8 tasked
2025-07-24 00:11:29,092 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-769) tasked for imaging
2025-07-24 00:11:29,093 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-769) window enabled: 261.0 to 402.9
2025-07-24 00:11:29,094 sats.satellite.EO-3            INFO       <0.00> EO-3: setting timed terminal event at 402.9
2025-07-24 00:11:29,234 sats.satellite.EO-2            INFO       <218.00> EO-2: imaged Target(tgt-567)
2025-07-24 00:11:29,238 data.base                      INFO       <218.00> Total reward: {'EO-2': 0.2978612814408208}
2025-07-24 00:11:29,240 sats.satellite.EO-2            INFO       <218.00> EO-2: Satellite EO-2 requires retasking
2025-07-24 00:11:29,240 sats.satellite.EO-1            INFO       <218.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-07-24 00:11:29,280 gym                            INFO       <218.00> Step reward: 0.2978612814408208

[6]:

observation

[6]:

(array([ 0.0260755 , -0.02067075,  0.08534221, -0.0123288 ,  0.90207526,
        -0.01154835,  0.76175644,  0.02336662,  0.42411203,  0.01943874,
         0.78355632,  0.03150457,  0.22904888,  0.04559814,  0.7198488 ,
         0.06487512,  0.43861631,  0.03578045,  0.48769155,  0.09004693]),
 array([ 0.41729251, -0.02520064,  0.08411587, -0.01691761,  0.5989434 ,
         0.00829291,  0.49854501, -0.00807138,  0.93762817, -0.00272554,
         0.3256186 ,  0.00844306,  0.40049261,  0.03002897,  0.61499358,
         0.05989572,  0.60137596,  0.04353885,  0.24007406,  0.06332535]),
 array([ 0.45745169, -0.03049792,  0.07707695, -0.02703612,  0.34844109,
        -0.00318943,  0.71173915, -0.00516661,  0.5941078 ,  0.00754839,
         0.12419356,  0.01243843,  0.71959515,  0.01727564,  0.12889472,
         0.0372256 ,  0.31860735,  0.0543901 ,  0.50252752,  0.04459114]))

At this point, either every satellite can be retasked, or satellites can continue their previous action by passing None as the action. To see which satellites must be retasked (i.e. their previous action is done and they have nothing more to do), look at "requires_retasking" in each satellite’s info.

[7]:

info

[7]:

{'EO-1': {'requires_retasking': False},
 'EO-2': {'requires_retasking': True},
 'EO-3': {'requires_retasking': False},
 'd_ts': 218.0}

Based on this list, we decide here to only retask the satellite that needs it.

[8]:

actions = [0 if info[sat.name]["requires_retasking"] else None for sat in env.unwrapped.satellites]
actions

[8]:

[None, 0, None]

[9]:

observation, reward, terminated, truncated, info = env.step(actions)

2025-07-24 00:11:29,301 gym                            INFO       <218.00> === STARTING STEP ===
2025-07-24 00:11:29,303 sats.satellite.EO-2            INFO       <218.00> EO-2: target index 0 tasked
2025-07-24 00:11:29,303 sats.satellite.EO-2            INFO       <218.00> EO-2: Target(tgt-393) tasked for imaging
2025-07-24 00:11:29,304 sats.satellite.EO-2            INFO       <218.00> EO-2: Target(tgt-393) window enabled: 74.4 to 267.3
2025-07-24 00:11:29,305 sats.satellite.EO-2            INFO       <218.00> EO-2: setting timed terminal event at 267.3
2025-07-24 00:11:29,334 sats.satellite.EO-3            INFO       <264.00> EO-3: imaged Target(tgt-769)
2025-07-24 00:11:29,338 data.base                      INFO       <264.00> Total reward: {'EO-3': 0.5941077999250646}
2025-07-24 00:11:29,339 sats.satellite.EO-3            INFO       <264.00> EO-3: Satellite EO-3 requires retasking
2025-07-24 00:11:29,342 gym                            INFO       <264.00> Step reward: 0.5941077999250646

In this environment, the environment will stop if any agent dies. To demonstrate this, one satellite is forcibly killed.

[10]:

from Basilisk.architecture import messaging

def isnt_alive(log_failure=False):
    """Mock satellite 0 dying."""
    self = env.unwrapped.satellites[0]
    death_message = messaging.PowerStorageStatusMsgPayload()
    death_message.storageLevel = 0.0
    self.dynamics.powerMonitor.batPowerOutMsg.write(death_message)
    return self.dynamics.is_alive(log_failure=log_failure) and self.fsw.is_alive(
        log_failure=log_failure
    )

env.unwrapped.satellites[0].is_alive = isnt_alive
observation, reward, terminated, truncated, info = env.step([6, 7, 9])

2025-07-24 00:11:29,348 gym                            INFO       <264.00> === STARTING STEP ===
2025-07-24 00:11:29,349 sats.satellite.EO-1            INFO       <264.00> EO-1: target index 6 tasked
2025-07-24 00:11:29,349 sats.satellite.EO-1            INFO       <264.00> EO-1: Target(tgt-221) tasked for imaging
2025-07-24 00:11:29,351 sats.satellite.EO-1            INFO       <264.00> EO-1: Target(tgt-221) window enabled: 477.9 to 687.1
2025-07-24 00:11:29,351 sats.satellite.EO-1            INFO       <264.00> EO-1: setting timed terminal event at 687.1
2025-07-24 00:11:29,352 sats.satellite.EO-2            INFO       <264.00> EO-2: target index 7 tasked
2025-07-24 00:11:29,353 sats.satellite.EO-2            INFO       <264.00> EO-2: Target(tgt-230) tasked for imaging
2025-07-24 00:11:29,354 sats.satellite.EO-2            INFO       <264.00> EO-2: Target(tgt-230) window enabled: 559.4 to 600.0
2025-07-24 00:11:29,355 sats.satellite.EO-2            INFO       <264.00> EO-2: setting timed terminal event at 600.0
2025-07-24 00:11:29,356 sats.satellite.EO-3            INFO       <264.00> EO-3: target index 9 tasked
2025-07-24 00:11:29,356 sats.satellite.EO-3            INFO       <264.00> EO-3: Target(tgt-202) tasked for imaging
2025-07-24 00:11:29,357 sats.satellite.EO-3            INFO       <264.00> EO-3: Target(tgt-202) window enabled: 559.4 to 600.0
2025-07-24 00:11:29,358 sats.satellite.EO-3            INFO       <264.00> EO-3: setting timed terminal event at 600.0
2025-07-24 00:11:29,496 sats.satellite.EO-1            INFO       <480.00> EO-1: imaged Target(tgt-221)
2025-07-24 00:11:29,499 data.base                      INFO       <480.00> Total reward: {'EO-1': 0.22904887971993804}
2025-07-24 00:11:29,501 sats.satellite.EO-1            INFO       <480.00> EO-1: Satellite EO-1 requires retasking
2025-07-24 00:11:29,502 sats.satellite.EO-2            INFO       <480.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-07-24 00:11:29,563 sats.satellite.EO-1            WARNING    <480.00> EO-1: failed battery_valid check
2025-07-24 00:11:29,565 gym                            INFO       <480.00> Step reward: -0.770951120280062
2025-07-24 00:11:29,565 gym                            INFO       <480.00> Episode terminated: True
2025-07-24 00:11:29,566 gym                            INFO       <480.00> Episode truncated: False

PettingZoo API

The PettingZoo parallel API environment, ConstellationTasking, is largely the same as GeneralSatelliteTasking. See their documentation for a full description of the API. It tends to separate things into dictionaries keyed by agent, rather than tuples.

[11]:

from bsk_rl import ConstellationTasking

env = ConstellationTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_spaces

2025-07-24 00:11:29,574                                WARNING    Creating logger for new env on PID=4809. Old environments in process may now log times incorrectly.
2025-07-24 00:11:29,683 gym                            INFO       Resetting environment with seed=531148455
2025-07-24 00:11:29,686 scene.targets                  INFO       Generating 1000 targets
2025-07-24 00:11:30,059 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2025-07-24 00:11:30,101 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2025-07-24 00:11:30,138 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2025-07-24 00:11:30,173 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-07-24 00:11:30,209 gym                            INFO       <0.00> Environment reset

[11]:

{'EO-1': Box(-1e+16, 1e+16, (20,), float64),
 'EO-2': Box(-1e+16, 1e+16, (20,), float64),
 'EO-3': Box(-1e+16, 1e+16, (20,), float64)}

[12]:

env.action_spaces

[12]:

{'EO-1': Discrete(10), 'EO-2': Discrete(10), 'EO-3': Discrete(10)}

Actions are passed as a dictionary; the agent names can be accessed through the agents property.

[13]:

observation, reward, terminated, truncated, info = env.step(
    {
        env.agents[0]: 7,
        env.agents[1]: 9,
        env.agents[2]: 8,
    }
)

2025-07-24 00:11:30,222 gym                            INFO       <0.00> === STARTING STEP ===
2025-07-24 00:11:30,223 sats.satellite.EO-1            INFO       <0.00> EO-1: target index 7 tasked
2025-07-24 00:11:30,224 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-643) tasked for imaging
2025-07-24 00:11:30,225 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-643) window enabled: 232.5 to 274.5
2025-07-24 00:11:30,226 sats.satellite.EO-1            INFO       <0.00> EO-1: setting timed terminal event at 274.5
2025-07-24 00:11:30,227 sats.satellite.EO-2            INFO       <0.00> EO-2: target index 9 tasked
2025-07-24 00:11:30,227 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-889) tasked for imaging
2025-07-24 00:11:30,229 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-889) window enabled: 325.5 to 532.4
2025-07-24 00:11:30,229 sats.satellite.EO-2            INFO       <0.00> EO-2: setting timed terminal event at 532.4
2025-07-24 00:11:30,230 sats.satellite.EO-3            INFO       <0.00> EO-3: target index 8 tasked
2025-07-24 00:11:30,231 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-72) tasked for imaging
2025-07-24 00:11:30,232 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-72) window enabled: 644.1 to 819.0
2025-07-24 00:11:30,233 sats.satellite.EO-3            INFO       <0.00> EO-3: setting timed terminal event at 819.0
2025-07-24 00:11:30,381 sats.satellite.EO-1            INFO       <235.00> EO-1: imaged Target(tgt-643)
2025-07-24 00:11:30,386 data.base                      INFO       <235.00> Total reward: {'EO-1': 0.7218348084957107}
2025-07-24 00:11:30,387 sats.satellite.EO-1            INFO       <235.00> EO-1: Satellite EO-1 requires retasking
2025-07-24 00:11:30,389 sats.satellite.EO-2            INFO       <235.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-07-24 00:11:30,433 gym                            INFO       <235.00> Step reward: {'EO-1': 0.7218348084957107}

[14]:

observation

[14]:

{'EO-1': array([ 0.13768644, -0.03257078,  0.9248552 , -0.0225186 ,  0.27746668,
         0.01678278,  0.35614226,  0.01701808,  0.26577624,  0.02463894,
         0.83775528,  0.01345715,  0.63065208,  0.01057402,  0.2062252 ,
         0.02316694,  0.52188136,  0.02110127,  0.23982244,  0.02188261]),
 'EO-2': array([ 0.12333391, -0.00600153,  0.26756916, -0.0082143 ,  0.38535228,
         0.02497152,  0.69433566,  0.01371252,  0.96368168,  0.01588323,
         0.65932197,  0.0276532 ,  0.97044995,  0.05163078,  0.37576839,
         0.07023694,  0.09069118,  0.08090633,  0.72102421,  0.07577991]),
 'EO-3': array([ 0.87930137, -0.00955699,  0.28381754, -0.02088986,  0.50568544,
         0.00246204,  0.27769423, -0.00788257,  0.04615604,  0.02873789,
         0.70267742,  0.00221337,  0.95624156,  0.01312664,  0.72695931,
         0.07177669,  0.0750084 ,  0.07019563,  0.0440381 ,  0.07694222])}

Other than compatibility with MARL algorithms, the main benefit of the PettingZoo API is that it allows for individual agents to fail without terminating the entire environment.

[15]:

# Immediately kill satellite 0
env.unwrapped.satellites[0].is_alive = isnt_alive
env.agents

[15]:

['EO-1', 'EO-2', 'EO-3']

[16]:

observation, reward, terminated, truncated, info = env.step({
        env.agents[0]: 7,
        env.agents[1]: 9,
    }
)

2025-07-24 00:11:30,450 gym                            INFO       <235.00> === STARTING STEP ===
2025-07-24 00:11:30,451 sats.satellite.EO-1            INFO       <235.00> EO-1: target index 7 tasked
2025-07-24 00:11:30,452 sats.satellite.EO-1            INFO       <235.00> EO-1: Target(tgt-888) tasked for imaging
2025-07-24 00:11:30,453 sats.satellite.EO-1            INFO       <235.00> EO-1: Target(tgt-888) window enabled: 367.1 to 536.4
2025-07-24 00:11:30,454 sats.satellite.EO-1            INFO       <235.00> EO-1: setting timed terminal event at 536.4
2025-07-24 00:11:30,455 sats.satellite.EO-2            INFO       <235.00> EO-2: target index 9 tasked
2025-07-24 00:11:30,455 sats.satellite.EO-2            INFO       <235.00> EO-2: Target(tgt-416) tasked for imaging
2025-07-24 00:11:30,457 sats.satellite.EO-2            INFO       <235.00> EO-2: Target(tgt-416) window enabled: 666.9 to 864.8
2025-07-24 00:11:30,457 sats.satellite.EO-2            INFO       <235.00> EO-2: setting timed terminal event at 864.8
2025-07-24 00:11:30,539 sats.satellite.EO-1            INFO       <370.00> EO-1: imaged Target(tgt-888)
2025-07-24 00:11:30,543 data.base                      INFO       <370.00> Total reward: {'EO-1': 0.20622519501708747}
2025-07-24 00:11:30,545 sats.satellite.EO-1            INFO       <370.00> EO-1: Satellite EO-1 requires retasking
2025-07-24 00:11:30,547 sats.satellite.EO-1            INFO       <370.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-07-24 00:11:30,585 gym                            INFO       <370.00> Step reward: {'EO-1': -0.7937748049829125}
2025-07-24 00:11:30,586 gym                            INFO       <370.00> Episode terminated: ['EO-1']