Multi-Agent Environments

Two multiagent environments are given in the package:

The latter is preferable for multi-agent RL (MARL) settings, as most algorithms are designed for this kind of API.

Configuring the Environment

For this example, a multisatellite target imaging environment will be used. The goal is to maximize the value of unique images taken.

As usual, the satellite type is defined first.

[1]:
from bsk_rl import sats, act, obs, scene, data, comm
from bsk_rl.sim import dyn, fsw

class ImagingSatellite(sats.ImagingSatellite):
    observation_spec = [
        obs.OpportunityProperties(
            dict(prop="priority"),
            dict(prop="opportunity_open", norm=5700.0),
            n_ahead_observe=10,
        )
    ]
    action_spec = [act.Image(n_ahead_image=10)]
    dyn_type = dyn.FullFeaturedDynModel
    fsw_type = fsw.SteeringImagerFSWModel

Satellite properties are set to give the satellite near-unlimited power and storage resources. To randomize some parameters in a correlated manner across satellites, a sat_arg_randomizer is set and passed to the environment. In this case, the satellites are distributed in a trivial single-plane Walker-delta constellation.

[2]:

from bsk_rl.utils.orbital import walker_delta_args sat_args = dict( imageAttErrorRequirement=0.01, imageRateErrorRequirement=0.01, batteryStorageCapacity=1e9, storedCharge_Init=1e9, dataStorageCapacity=1e12, u_max=0.4, K1=0.25, K3=3.0, omega_max=0.087, servo_Ki=5.0, servo_P=150 / 5, ) sat_arg_randomizer = walker_delta_args(altitude=800.0, inc=60.0, n_planes=1)

Gym API

GeneralSatelliteTasking uses tuples of actions and observations to interact with the environment.

[3]:
from bsk_rl import GeneralSatelliteTasking

env = GeneralSatelliteTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_space
2025-10-16 18:33:30,723 gym                            INFO       Resetting environment with seed=3467455020
2025-10-16 18:33:30,726 scene.targets                  INFO       Generating 1000 targets
2025-10-16 18:33:30,985 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2025-10-16 18:33:31,024 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-10-16 18:33:31,059 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2025-10-16 18:33:31,103 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2025-10-16 18:33:31,139 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-10-16 18:33:31,182 gym                            INFO       <0.00> Environment reset
[3]:
Tuple(Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64))
[4]:
env.action_space
[4]:
Tuple(Discrete(10), Discrete(10), Discrete(10))

Consequently, actions are passed as a tuple. The step will stop the first time any satellite completes an action.

[5]:
observation, reward, terminated, truncated, info = env.step([7, 9, 8])
2025-10-16 18:33:31,196 gym                            INFO       <0.00> === STARTING STEP ===
2025-10-16 18:33:31,197 sats.satellite.EO-1            INFO       <0.00> EO-1: target index 7 tasked
2025-10-16 18:33:31,197 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-813) tasked for imaging
2025-10-16 18:33:31,198 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-813) window enabled: 477.3 to 676.6
2025-10-16 18:33:31,199 sats.satellite.EO-1            INFO       <0.00> EO-1: setting timed terminal event at 676.6
2025-10-16 18:33:31,200 sats.satellite.EO-2            INFO       <0.00> EO-2: target index 9 tasked
2025-10-16 18:33:31,200 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-927) tasked for imaging
2025-10-16 18:33:31,201 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-927) window enabled: 298.4 to 488.6
2025-10-16 18:33:31,201 sats.satellite.EO-2            INFO       <0.00> EO-2: setting timed terminal event at 488.6
2025-10-16 18:33:31,202 sats.satellite.EO-3            INFO       <0.00> EO-3: target index 8 tasked
2025-10-16 18:33:31,203 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-128) tasked for imaging
2025-10-16 18:33:31,205 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-128) window enabled: 604.7 to 792.9
2025-10-16 18:33:31,205 sats.satellite.EO-3            INFO       <0.00> EO-3: setting timed terminal event at 792.9
2025-10-16 18:33:31,268 sats.satellite.EO-2            INFO       <301.00> EO-2: imaged Target(tgt-927)
2025-10-16 18:33:31,269 data.base                      INFO       <301.00> Total reward: {'EO-2': 0.39348166025353615}
2025-10-16 18:33:31,269 sats.satellite.EO-2            INFO       <301.00> EO-2: Satellite EO-2 requires retasking
2025-10-16 18:33:31,272 gym                            INFO       <301.00> Step reward: 0.39348166025353615
[6]:
observation
[6]:
(array([ 0.37306587, -0.02545513,  0.21272637, -0.02241031,  0.33036589,
        -0.02282521,  0.93759701,  0.04405742,  0.0544906 ,  0.03093422,
         0.9221019 ,  0.02543819,  0.32909388,  0.05809488,  0.3065742 ,
         0.05535989,  0.68823593,  0.06821002,  0.53806702,  0.08285302]),
 array([ 0.20009264, -0.02400116,  0.63678191, -0.02685195,  0.24678511,
        -0.02262457,  0.32874149, -0.01574447,  0.14997187,  0.00274267,
         0.73885331,  0.01074702,  0.90784823,  0.04889327,  0.33682647,
         0.05144065,  0.34053307,  0.04028612,  0.92518588,  0.02709524]),
 array([ 0.57508268, -0.03247803,  0.19666447,  0.0018073 ,  0.89676445,
         0.00665213,  0.66492097,  0.00835257,  0.37838721,  0.01368084,
         0.47804212,  0.05328345,  0.49407615,  0.07127237,  0.49037092,
         0.05746829,  0.83477639,  0.06984705,  0.25452044,  0.07860376]))

At this point, either every satellite can be retasked, or satellites can continue their previous action by passing None as the action. To see which satellites must be retasked (i.e. their previous action is done and they have nothing more to do), look at "requires_retasking" in each satellite’s info.

[7]:
info
[7]:
{'EO-1': {'requires_retasking': False},
 'EO-2': {'requires_retasking': True},
 'EO-3': {'requires_retasking': False},
 'd_ts': 301.0}

Based on this list, we decide here to only retask the satellite that needs it.

[8]:
actions = [0 if info[sat.name]["requires_retasking"] else None for sat in env.unwrapped.satellites]
actions
[8]:
[None, 0, None]
[9]:
observation, reward, terminated, truncated, info = env.step(actions)
2025-10-16 18:33:31,293 gym                            INFO       <301.00> === STARTING STEP ===
2025-10-16 18:33:31,294 sats.satellite.EO-2            INFO       <301.00> EO-2: target index 0 tasked
2025-10-16 18:33:31,295 sats.satellite.EO-2            INFO       <301.00> EO-2: Target(tgt-859) tasked for imaging
2025-10-16 18:33:31,296 sats.satellite.EO-2            INFO       <301.00> EO-2: Target(tgt-859) window enabled: 164.2 to 336.9
2025-10-16 18:33:31,296 sats.satellite.EO-2            INFO       <301.00> EO-2: setting timed terminal event at 336.9
2025-10-16 18:33:31,305 sats.satellite.EO-2            INFO       <337.00> EO-2: timed termination at 336.9 for Target(tgt-859) window
2025-10-16 18:33:31,306 data.base                      INFO       <337.00> Total reward: {}
2025-10-16 18:33:31,307 sats.satellite.EO-2            INFO       <337.00> EO-2: Satellite EO-2 requires retasking
2025-10-16 18:33:31,309 gym                            INFO       <337.00> Step reward: 0.0

In this environment, the environment will stop if any agent dies. To demonstrate this, one satellite is forcibly killed.

[10]:
from Basilisk.architecture import messaging

def isnt_alive(log_failure=False):
    """Mock satellite 0 dying."""
    self = env.unwrapped.satellites[0]
    death_message = messaging.PowerStorageStatusMsgPayload()
    death_message.storageLevel = 0.0
    self.dynamics.powerMonitor.batPowerOutMsg.write(death_message)
    return self.dynamics.is_alive(log_failure=log_failure) and self.fsw.is_alive(
        log_failure=log_failure
    )

env.unwrapped.satellites[0].is_alive = isnt_alive
observation, reward, terminated, truncated, info = env.step([6, 7, 9])

2025-10-16 18:33:31,315 gym                            INFO       <337.00> === STARTING STEP ===
2025-10-16 18:33:31,316 sats.satellite.EO-1            INFO       <337.00> EO-1: target index 6 tasked
2025-10-16 18:33:31,316 sats.satellite.EO-1            INFO       <337.00> EO-1: Target(tgt-72) tasked for imaging
2025-10-16 18:33:31,317 sats.satellite.EO-1            INFO       <337.00> EO-1: Target(tgt-72) window enabled: 616.6 to 794.6
2025-10-16 18:33:31,317 sats.satellite.EO-1            INFO       <337.00> EO-1: setting timed terminal event at 794.6
2025-10-16 18:33:31,318 sats.satellite.EO-2            INFO       <337.00> EO-2: target index 7 tasked
2025-10-16 18:33:31,319 sats.satellite.EO-2            INFO       <337.00> EO-2: Target(tgt-139) tasked for imaging
2025-10-16 18:33:31,319 sats.satellite.EO-2            INFO       <337.00> EO-2: Target(tgt-139) window enabled: 530.6 to 600.0
2025-10-16 18:33:31,320 sats.satellite.EO-2            INFO       <337.00> EO-2: setting timed terminal event at 600.0
2025-10-16 18:33:31,321 sats.satellite.EO-3            INFO       <337.00> EO-3: target index 9 tasked
2025-10-16 18:33:31,321 sats.satellite.EO-3            INFO       <337.00> EO-3: Target(tgt-534) tasked for imaging
2025-10-16 18:33:31,322 sats.satellite.EO-3            INFO       <337.00> EO-3: Target(tgt-534) window enabled: 806.4 to 903.4
2025-10-16 18:33:31,322 sats.satellite.EO-3            INFO       <337.00> EO-3: setting timed terminal event at 903.4
2025-10-16 18:33:31,371 sats.satellite.EO-2            INFO       <533.00> EO-2: imaged Target(tgt-139)
2025-10-16 18:33:31,371 data.base                      INFO       <533.00> Total reward: {'EO-2': 0.340533067790575}
2025-10-16 18:33:31,372 sats.satellite.EO-2            INFO       <533.00> EO-2: Satellite EO-2 requires retasking
2025-10-16 18:33:31,373 sats.satellite.EO-1            INFO       <533.00> EO-1: Finding opportunity windows from 1200.00 to 1800.00 seconds
2025-10-16 18:33:31,420 sats.satellite.EO-2            INFO       <533.00> EO-2: Finding opportunity windows from 600.00 to 1200.00 seconds
2025-10-16 18:33:31,459 sats.satellite.EO-1            WARNING    <533.00> EO-1: failed battery_valid check
2025-10-16 18:33:31,460 gym                            INFO       <533.00> Step reward: -0.659466932209425
2025-10-16 18:33:31,460 gym                            INFO       <533.00> Episode terminated: True
2025-10-16 18:33:31,461 gym                            INFO       <533.00> Episode truncated: False

PettingZoo API

The PettingZoo parallel API environment, ConstellationTasking, is largely the same as GeneralSatelliteTasking. See their documentation for a full description of the API. It tends to separate things into dictionaries keyed by agent, rather than tuples.

[11]:
from bsk_rl import ConstellationTasking

env = ConstellationTasking(
    satellites=[
        ImagingSatellite("EO-1", sat_args),
        ImagingSatellite("EO-2", sat_args),
        ImagingSatellite("EO-3", sat_args),
    ],
    scenario=scene.UniformTargets(1000),
    rewarder=data.UniqueImageReward(),
    communicator=comm.LOSCommunication(),  # Note that dyn must inherit from LOSCommunication
    sat_arg_randomizer=sat_arg_randomizer,
    log_level="INFO",
)
env.reset()

env.observation_spaces
2025-10-16 18:33:31,469                                WARNING    Creating logger for new env on PID=4799. Old environments in process may now log times incorrectly.
2025-10-16 18:33:31,575 gym                            INFO       Resetting environment with seed=23659936
2025-10-16 18:33:31,577 scene.targets                  INFO       Generating 1000 targets
2025-10-16 18:33:31,738 sats.satellite.EO-1            INFO       <0.00> EO-1: Finding opportunity windows from 0.00 to 600.00 seconds
2025-10-16 18:33:31,786 sats.satellite.EO-2            INFO       <0.00> EO-2: Finding opportunity windows from 0.00 to 600.00 seconds
2025-10-16 18:33:31,829 sats.satellite.EO-3            INFO       <0.00> EO-3: Finding opportunity windows from 0.00 to 600.00 seconds
2025-10-16 18:33:31,870 gym                            INFO       <0.00> Environment reset
[11]:
{'EO-1': Box(-1e+16, 1e+16, (20,), float64),
 'EO-2': Box(-1e+16, 1e+16, (20,), float64),
 'EO-3': Box(-1e+16, 1e+16, (20,), float64)}
[12]:
env.action_spaces
[12]:
{'EO-1': Discrete(10), 'EO-2': Discrete(10), 'EO-3': Discrete(10)}

Actions are passed as a dictionary; the agent names can be accessed through the agents property.

[13]:
observation, reward, terminated, truncated, info = env.step(
    {
        env.agents[0]: 7,
        env.agents[1]: 9,
        env.agents[2]: 8,
    }
)
2025-10-16 18:33:31,883 gym                            INFO       <0.00> === STARTING STEP ===
2025-10-16 18:33:31,884 sats.satellite.EO-1            INFO       <0.00> EO-1: target index 7 tasked
2025-10-16 18:33:31,884 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-937) tasked for imaging
2025-10-16 18:33:31,885 sats.satellite.EO-1            INFO       <0.00> EO-1: Target(tgt-937) window enabled: 93.4 to 291.1
2025-10-16 18:33:31,886 sats.satellite.EO-1            INFO       <0.00> EO-1: setting timed terminal event at 291.1
2025-10-16 18:33:31,887 sats.satellite.EO-2            INFO       <0.00> EO-2: target index 9 tasked
2025-10-16 18:33:31,887 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-495) tasked for imaging
2025-10-16 18:33:31,888 sats.satellite.EO-2            INFO       <0.00> EO-2: Target(tgt-495) window enabled: 302.5 to 472.3
2025-10-16 18:33:31,888 sats.satellite.EO-2            INFO       <0.00> EO-2: setting timed terminal event at 472.3
2025-10-16 18:33:31,889 sats.satellite.EO-3            INFO       <0.00> EO-3: target index 8 tasked
2025-10-16 18:33:31,889 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-993) tasked for imaging
2025-10-16 18:33:31,890 sats.satellite.EO-3            INFO       <0.00> EO-3: Target(tgt-993) window enabled: 127.7 to 335.7
2025-10-16 18:33:31,890 sats.satellite.EO-3            INFO       <0.00> EO-3: setting timed terminal event at 335.7
2025-10-16 18:33:31,916 sats.satellite.EO-1            INFO       <98.00> EO-1: imaged Target(tgt-937)
2025-10-16 18:33:31,917 data.base                      INFO       <98.00> Total reward: {'EO-1': 0.5870099889958942}
2025-10-16 18:33:31,917 sats.satellite.EO-1            INFO       <98.00> EO-1: Satellite EO-1 requires retasking
2025-10-16 18:33:31,921 gym                            INFO       <98.00> Step reward: {'EO-1': 0.5870099889958942}
[14]:
observation
[14]:
{'EO-1': array([ 7.24758570e-01, -8.75078140e-03,  4.43046361e-01, -1.54097788e-02,
         9.19792959e-01, -1.50686629e-02,  5.74213143e-01, -1.10703283e-02,
         8.78799776e-01,  1.22221414e-02,  8.98026480e-01, -7.89825965e-04,
         6.57794345e-02,  1.44706942e-02,  4.92193184e-01,  6.81685227e-03,
         1.43905551e-01,  1.56349862e-02,  9.27807154e-01,  2.25150079e-02]),
 'EO-2': array([0.78239239, 0.00311302, 0.49493697, 0.0188918 , 0.38345857,
        0.00679904, 0.10769131, 0.02141364, 0.01865208, 0.02122341,
        0.79402482, 0.017308  , 0.10794725, 0.01468513, 0.40311949,
        0.02308148, 0.33267162, 0.02656222, 0.1962493 , 0.03587294]),
 'EO-3': array([ 0.45441042, -0.01719298,  0.14526972, -0.01719298,  0.09164544,
        -0.01719298,  0.22802086,  0.00297535,  0.31352909,  0.005079  ,
         0.92637634,  0.00521634,  0.00450441,  0.00936446,  0.87924166,
         0.03301435,  0.64017551,  0.03147354,  0.09597555,  0.04981483])}

Other than compatibility with MARL algorithms, the main benefit of the PettingZoo API is that it allows for individual agents to fail without terminating the entire environment.

[15]:
# Immediately kill satellite 0
env.unwrapped.satellites[0].is_alive = isnt_alive
env.agents
[15]:
['EO-1', 'EO-2', 'EO-3']
[16]:
observation, reward, terminated, truncated, info = env.step({
        env.agents[0]: 7,
        env.agents[1]: 9,
    }
)
2025-10-16 18:33:31,936 gym                            INFO       <98.00> === STARTING STEP ===
2025-10-16 18:33:31,937 sats.satellite.EO-1            INFO       <98.00> EO-1: target index 7 tasked
2025-10-16 18:33:31,937 sats.satellite.EO-1            INFO       <98.00> EO-1: Target(tgt-246) tasked for imaging
2025-10-16 18:33:31,938 sats.satellite.EO-1            INFO       <98.00> EO-1: Target(tgt-246) window enabled: 136.9 to 299.5
2025-10-16 18:33:31,939 sats.satellite.EO-1            INFO       <98.00> EO-1: setting timed terminal event at 299.5
2025-10-16 18:33:31,939 sats.satellite.EO-2            INFO       <98.00> EO-2: target index 9 tasked
2025-10-16 18:33:31,940 sats.satellite.EO-2            INFO       <98.00> EO-2: Target(tgt-495) window enabled: 302.5 to 472.3
2025-10-16 18:33:31,940 sats.satellite.EO-2            INFO       <98.00> EO-2: setting timed terminal event at 472.3
2025-10-16 18:33:31,949 sats.satellite.EO-3            INFO       <130.00> EO-3: imaged Target(tgt-993)
2025-10-16 18:33:31,949 data.base                      INFO       <130.00> Total reward: {'EO-3': 0.9263763405144467}
2025-10-16 18:33:31,950 sats.satellite.EO-3            INFO       <130.00> EO-3: Satellite EO-3 requires retasking
2025-10-16 18:33:31,953 gym                            INFO       <130.00> Step reward: {'EO-1': -1.0, 'EO-3': 0.9263763405144467}
2025-10-16 18:33:31,954 gym                            INFO       <130.00> Episode terminated: ['EO-1']