# Multi-Agent Environments

Two multiagent environments are given in the package:

* [GeneralSatelliteTasking](../api_reference/index.rst#bsk_rl.GeneralSatelliteTasking), 
 a [Gymnasium](https://gymnasium.farama.org)-based environment and the basis for all other environments.
* [ConstellationTasking](../api_reference/index.rst#bsk_rl.ConstellationTasking), which
 implements the [PettingZoo parallel API](https://pettingzoo.farama.org/api/parallel/).

The latter is preferable for multi-agent RL (MARL) settings, as most algorithms are designed
for this kind of API.

## Configuring the Environment

For this example, a multisatellite target imaging environment will be used. The goal is
to maximize the value of unique images taken.

As usual, the satellite type is defined first.

In [1]:
from bsk_rl import sats, act, obs, scene, data, comm
from bsk_rl.sim import dyn, fsw

class ImagingSatellite(sats.ImagingSatellite):
 observation_spec = [
 obs.OpportunityProperties(
 dict(prop="priority"), 
 dict(prop="opportunity_open", norm=5700.0),
 n_ahead_observe=10,
 )
 ]
 action_spec = [act.Image(n_ahead_image=10)]
 dyn_type = dyn.FullFeaturedDynModel
 fsw_type = fsw.SteeringImagerFSWModel

Satellite properties are set to give the satellite near-unlimited power and storage resources. To randomize some parameters in a correlated manner across satellites, a ``sat_arg_randomizer`` is set and passed to the environment. In this case, the satellites are distributed in a trivial single-plane Walker-delta constellation.

In [2]:

from bsk_rl.utils.orbital import walker_delta_args

sat_args = dict(
 imageAttErrorRequirement=0.01,
 imageRateErrorRequirement=0.01,
 batteryStorageCapacity=1e9,
 storedCharge_Init=1e9,
 dataStorageCapacity=1e12,
 u_max=0.4,
 K1=0.25,
 K3=3.0,
 omega_max=0.087,
 servo_Ki=5.0,
 servo_P=150 / 5,
)
sat_arg_randomizer = walker_delta_args(altitude=800.0, inc=60.0, n_planes=1)

## Gym API

GeneralSatelliteTasking uses tuples of actions and observations to interact with the
environment.

In [3]:
from bsk_rl import GeneralSatelliteTasking

env = GeneralSatelliteTasking(
 satellites=[
 ImagingSatellite("EO-1", sat_args),
 ImagingSatellite("EO-2", sat_args),
 ImagingSatellite("EO-3", sat_args),
 ],
 scenario=scene.UniformTargets(1000),
 rewarder=data.UniqueImageReward(),
 communicator=comm.LOSCommunication(), # Note that dyn must inherit from LOSCommunication
 sat_arg_randomizer=sat_arg_randomizer,
 log_level="INFO",
)
env.reset()

env.observation_space

[90;3m2024-09-12 15:06:03,760 [0m[mgym [0m[mINFO [0m[mResetting environment with seed=1879338696[0m


[90;3m2024-09-12 15:06:03,761 [0m[mscene.targets [0m[mINFO [0m[mGenerating 1000 targets[0m


[90;3m2024-09-12 15:06:03,920 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<0.00> [0m[36mEO-1: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-09-12 15:06:03,943 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<0.00> [0m[36mEO-1: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-09-12 15:06:03,964 [0m[92msats.satellite.EO-2 [0m[mINFO [0m[33m<0.00> [0m[92mEO-2: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-09-12 15:06:03,982 [0m[92msats.satellite.EO-2 [0m[mINFO [0m[33m<0.00> [0m[92mEO-2: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-09-12 15:06:04,000 [0m[34msats.satellite.EO-3 [0m[mINFO [0m[33m<0.00> [0m[34mEO-3: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-09-12 15:06:04,020 [0m[34msats.satellite.EO-3 [0m[mINFO [0m[33m<0.00> [0m[34mEO-3: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-09-12 15:06:04,043 [0m[mgym [0m[mINFO [0m[33m<0.00> [0m[mEnvironment reset[0m


Tuple(Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64), Box(-1e+16, 1e+16, (20,), float64))

In [4]:
env.action_space

Tuple(Discrete(10), Discrete(10), Discrete(10))

Consequently, actions are passed as a tuple. The step will stop the first time any
satellite completes an action.

In [5]:
observation, reward, terminated, truncated, info = env.step([7, 9, 8])

[90;3m2024-09-12 15:06:04,053 [0m[mgym [0m[mINFO [0m[33m<0.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-09-12 15:06:04,053 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<0.00> [0m[36mEO-1: [0m[mtarget index 7 tasked[0m


[90;3m2024-09-12 15:06:04,053 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<0.00> [0m[36mEO-1: [0m[mTarget(tgt-195) tasked for imaging[0m


[90;3m2024-09-12 15:06:04,054 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<0.00> [0m[36mEO-1: [0m[mTarget(tgt-195) window enabled: 462.8 to 548.3[0m


[90;3m2024-09-12 15:06:04,054 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<0.00> [0m[36mEO-1: [0m[msetting timed terminal event at 548.3[0m


[90;3m2024-09-12 15:06:04,055 [0m[92msats.satellite.EO-2 [0m[mINFO [0m[33m<0.00> [0m[92mEO-2: [0m[mtarget index 9 tasked[0m


[90;3m2024-09-12 15:06:04,055 [0m[92msats.satellite.EO-2 [0m[mINFO [0m[33m<0.00> [0m[92mEO-2: [0m[mTarget(tgt-680) tasked for imaging[0m


[90;3m2024-09-12 15:06:04,056 [0m[92msats.satellite.EO-2 [0m[mINFO [0m[33m<0.00> [0m[92mEO-2: [0m[mTarget(tgt-680) window enabled: 646.1 to 833.3[0m


[90;3m2024-09-12 15:06:04,056 [0m[92msats.satellite.EO-2 [0m[mINFO [0m[33m<0.00> [0m[92mEO-2: [0m[msetting timed terminal event at 833.3[0m


[90;3m2024-09-12 15:06:04,056 [0m[34msats.satellite.EO-3 [0m[mINFO [0m[33m<0.00> [0m[34mEO-3: [0m[mtarget index 8 tasked[0m


[90;3m2024-09-12 15:06:04,057 [0m[34msats.satellite.EO-3 [0m[mINFO [0m[33m<0.00> [0m[34mEO-3: [0m[mTarget(tgt-211) tasked for imaging[0m


[90;3m2024-09-12 15:06:04,057 [0m[34msats.satellite.EO-3 [0m[mINFO [0m[33m<0.00> [0m[34mEO-3: [0m[mTarget(tgt-211) window enabled: 986.2 to 1077.6[0m


[90;3m2024-09-12 15:06:04,057 [0m[34msats.satellite.EO-3 [0m[mINFO [0m[33m<0.00> [0m[34mEO-3: [0m[msetting timed terminal event at 1077.6[0m


[90;3m2024-09-12 15:06:04,148 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<465.00> [0m[36mEO-1: [0m[mimaged Target(tgt-195)[0m


[90;3m2024-09-12 15:06:04,152 [0m[mdata.base [0m[mINFO [0m[33m<465.00> [0m[mData reward: {'EO-1': 0.8230500147347414, 'EO-2': 0.0, 'EO-3': 0.0}[0m


[90;3m2024-09-12 15:06:04,156 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<465.00> [0m[36mEO-1: [0m[mSatellite EO-1 requires retasking[0m


[90;3m2024-09-12 15:06:04,156 [0m[92msats.satellite.EO-2 [0m[mINFO [0m[33m<465.00> [0m[92mEO-2: [0m[mFinding opportunity windows from 1200.00 to 1800.00 seconds[0m


[90;3m2024-09-12 15:06:04,182 [0m[mgym [0m[mINFO [0m[33m<465.00> [0m[mStep reward: 0.8230500147347414[0m


In [6]:
observation

(array([ 0.21602972, -0.00403203, 0.28284637, 0.00356078, 0.4518415 ,
 0.04613867, 0.47992008, 0.05650256, 0.64224289, 0.06190224,
 0.74877742, 0.07368187, 0.59800648, 0.05895398, 0.30639417,
 0.06799013, 0.08390608, 0.0861771 , 0.47193369, 0.10286262]),
 array([0.88676028, 0.00289549, 0.26382734, 0.02413358, 0.33591011,
 0.03177945, 0.7800974 , 0.05823664, 0.76553764, 0.05932064,
 0.48517132, 0.09174613, 0.49501144, 0.11767415, 0.60871352,
 0.16188445, 0.53565483, 0.15520731, 0.7778392 , 0.15127552]),
 array([1.92145237e-01, 7.09354487e-04, 7.69811097e-02, 9.43758064e-03,
 8.96589817e-01, 4.59254776e-03, 9.05020147e-01, 2.69390689e-02,
 9.13063309e-01, 9.14312097e-02, 9.30135351e-01, 8.71906659e-02,
 1.55938107e-01, 7.51445464e-02, 2.97161675e-01, 8.92334327e-02,
 8.99333303e-01, 1.10183857e-01, 1.88123050e-01, 1.27600018e-01]))

At this point, either every satellite can be retasked, or satellites can continue their
previous action by passing `None` as the action. To see which satellites must be
retasked (i.e. their previous action is done and they have nothing more to do), look at
`"requires_retasking"` in each satellite's info.

In [7]:
info

{'EO-1': {'requires_retasking': True},
 'EO-2': {'requires_retasking': False},
 'EO-3': {'requires_retasking': False},
 'd_ts': 465.00000000000006}

Based on this list, we decide here to only retask the satellite that needs it.

In [8]:
actions = [0 if info[sat.name]["requires_retasking"] else None for sat in env.unwrapped.satellites]
actions

[0, None, None]

In [9]:
observation, reward, terminated, truncated, info = env.step(actions)

[90;3m2024-09-12 15:06:04,195 [0m[mgym [0m[mINFO [0m[33m<465.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-09-12 15:06:04,195 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<465.00> [0m[36mEO-1: [0m[mtarget index 0 tasked[0m


[90;3m2024-09-12 15:06:04,196 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<465.00> [0m[36mEO-1: [0m[mTarget(tgt-457) tasked for imaging[0m


[90;3m2024-09-12 15:06:04,196 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<465.00> [0m[36mEO-1: [0m[mTarget(tgt-457) window enabled: 442.0 to 532.9[0m


[90;3m2024-09-12 15:06:04,197 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<465.00> [0m[36mEO-1: [0m[msetting timed terminal event at 532.9[0m


[90;3m2024-09-12 15:06:04,210 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<529.00> [0m[36mEO-1: [0m[mimaged Target(tgt-457)[0m


[90;3m2024-09-12 15:06:04,212 [0m[mdata.base [0m[mINFO [0m[33m<529.00> [0m[mData reward: {'EO-1': 0.21602971982103714, 'EO-2': 0.0, 'EO-3': 0.0}[0m


[90;3m2024-09-12 15:06:04,213 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<529.00> [0m[36mEO-1: [0m[mSatellite EO-1 requires retasking[0m


[90;3m2024-09-12 15:06:04,214 [0m[mgym [0m[mINFO [0m[33m<529.00> [0m[mStep reward: 0.21602971982103714[0m


In this environment, the environment will stop if any agent dies. To demonstrate this,
one satellite is forcibly killed.

In [10]:
from Basilisk.architecture import messaging

def isnt_alive(log_failure=False):
 """Mock satellite 0 dying."""
 self = env.unwrapped.satellites[0]
 death_message = messaging.PowerStorageStatusMsgPayload()
 death_message.storageLevel = 0.0
 self.dynamics.powerMonitor.batPowerOutMsg.write(death_message)
 return self.dynamics.is_alive(log_failure=log_failure) and self.fsw.is_alive(
 log_failure=log_failure
 )

env.unwrapped.satellites[0].is_alive = isnt_alive
observation, reward, terminated, truncated, info = env.step([6, 7, 9])


[90;3m2024-09-12 15:06:04,218 [0m[mgym [0m[mINFO [0m[33m<529.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-09-12 15:06:04,219 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<529.00> [0m[36mEO-1: [0m[mtarget index 6 tasked[0m


[90;3m2024-09-12 15:06:04,219 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<529.00> [0m[36mEO-1: [0m[mTarget(tgt-507) tasked for imaging[0m


[90;3m2024-09-12 15:06:04,220 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<529.00> [0m[36mEO-1: [0m[mTarget(tgt-507) window enabled: 852.5 to 1057.2[0m


[90;3m2024-09-12 15:06:04,220 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<529.00> [0m[36mEO-1: [0m[msetting timed terminal event at 1057.2[0m


[90;3m2024-09-12 15:06:04,221 [0m[92msats.satellite.EO-2 [0m[mINFO [0m[33m<529.00> [0m[92mEO-2: [0m[mtarget index 7 tasked[0m


[90;3m2024-09-12 15:06:04,221 [0m[92msats.satellite.EO-2 [0m[mINFO [0m[33m<529.00> [0m[92mEO-2: [0m[mTarget(tgt-353) tasked for imaging[0m


[90;3m2024-09-12 15:06:04,222 [0m[92msats.satellite.EO-2 [0m[mINFO [0m[33m<529.00> [0m[92mEO-2: [0m[mTarget(tgt-353) window enabled: 1349.7 to 1528.5[0m


[90;3m2024-09-12 15:06:04,222 [0m[92msats.satellite.EO-2 [0m[mINFO [0m[33m<529.00> [0m[92mEO-2: [0m[msetting timed terminal event at 1528.5[0m


[90;3m2024-09-12 15:06:04,222 [0m[34msats.satellite.EO-3 [0m[mINFO [0m[33m<529.00> [0m[34mEO-3: [0m[mtarget index 9 tasked[0m


[90;3m2024-09-12 15:06:04,223 [0m[34msats.satellite.EO-3 [0m[mINFO [0m[33m<529.00> [0m[34mEO-3: [0m[mTarget(tgt-261) tasked for imaging[0m


[90;3m2024-09-12 15:06:04,223 [0m[34msats.satellite.EO-3 [0m[mINFO [0m[33m<529.00> [0m[34mEO-3: [0m[mTarget(tgt-261) window enabled: 1192.3 to 1200.0[0m


[90;3m2024-09-12 15:06:04,223 [0m[34msats.satellite.EO-3 [0m[mINFO [0m[33m<529.00> [0m[34mEO-3: [0m[msetting timed terminal event at 1200.0[0m


[90;3m2024-09-12 15:06:04,285 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<855.00> [0m[36mEO-1: [0m[mimaged Target(tgt-507)[0m


[90;3m2024-09-12 15:06:04,287 [0m[mdata.base [0m[mINFO [0m[33m<855.00> [0m[mData reward: {'EO-1': 0.30639416945804554, 'EO-2': 0.0, 'EO-3': 0.0}[0m


[90;3m2024-09-12 15:06:04,290 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<855.00> [0m[36mEO-1: [0m[mSatellite EO-1 requires retasking[0m


[90;3m2024-09-12 15:06:04,291 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<855.00> [0m[36mEO-1: [0m[mFinding opportunity windows from 1200.00 to 1800.00 seconds[0m




[90;3m2024-09-12 15:06:04,312 [0m[mgym [0m[mINFO [0m[33m<855.00> [0m[mStep reward: -0.6936058305419545[0m


[90;3m2024-09-12 15:06:04,312 [0m[mgym [0m[mINFO [0m[33m<855.00> [0m[mEpisode terminated: True[0m


[90;3m2024-09-12 15:06:04,312 [0m[mgym [0m[mINFO [0m[33m<855.00> [0m[mEpisode truncated: False[0m


## PettingZoo API

The [PettingZoo parallel API](https://pettingzoo.farama.org/api/parallel/) environment, 
ConstellationTasking, is largely the same as GeneralSatelliteTasking. See their
documentation for a full description of the API. It tends to separate things into
dictionaries keyed by agent, rather than tuples.

In [11]:
from bsk_rl import ConstellationTasking

env = ConstellationTasking(
 satellites=[
 ImagingSatellite("EO-1", sat_args),
 ImagingSatellite("EO-2", sat_args),
 ImagingSatellite("EO-3", sat_args),
 ],
 scenario=scene.UniformTargets(1000),
 rewarder=data.UniqueImageReward(),
 communicator=comm.LOSCommunication(), # Note that dyn must inherit from LOSCommunication
 sat_arg_randomizer=sat_arg_randomizer,
 log_level="INFO",
)
env.reset()

env.observation_spaces



[90;3m2024-09-12 15:06:04,501 [0m[mgym [0m[mINFO [0m[mResetting environment with seed=2620988696[0m


[90;3m2024-09-12 15:06:04,502 [0m[mscene.targets [0m[mINFO [0m[mGenerating 1000 targets[0m


[90;3m2024-09-12 15:06:04,653 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<0.00> [0m[36mEO-1: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-09-12 15:06:04,674 [0m[92msats.satellite.EO-2 [0m[mINFO [0m[33m<0.00> [0m[92mEO-2: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-09-12 15:06:04,699 [0m[34msats.satellite.EO-3 [0m[mINFO [0m[33m<0.00> [0m[34mEO-3: [0m[mFinding opportunity windows from 0.00 to 600.00 seconds[0m


[90;3m2024-09-12 15:06:04,725 [0m[mgym [0m[mINFO [0m[33m<0.00> [0m[mEnvironment reset[0m


{'EO-1': Box(-1e+16, 1e+16, (20,), float64),
 'EO-2': Box(-1e+16, 1e+16, (20,), float64),
 'EO-3': Box(-1e+16, 1e+16, (20,), float64)}

In [12]:
env.action_spaces

{'EO-1': Discrete(10), 'EO-2': Discrete(10), 'EO-3': Discrete(10)}

Actions are passed as a dictionary; the agent names can be accessed through the `agents`
property.

In [13]:
observation, reward, terminated, truncated, info = env.step(
 {
 env.agents[0]: 7,
 env.agents[1]: 9,
 env.agents[2]: 8,
 }
)

[90;3m2024-09-12 15:06:04,734 [0m[mgym [0m[mINFO [0m[33m<0.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-09-12 15:06:04,734 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<0.00> [0m[36mEO-1: [0m[mtarget index 7 tasked[0m


[90;3m2024-09-12 15:06:04,735 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<0.00> [0m[36mEO-1: [0m[mTarget(tgt-539) tasked for imaging[0m


[90;3m2024-09-12 15:06:04,735 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<0.00> [0m[36mEO-1: [0m[mTarget(tgt-539) window enabled: 288.7 to 440.8[0m


[90;3m2024-09-12 15:06:04,736 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<0.00> [0m[36mEO-1: [0m[msetting timed terminal event at 440.8[0m


[90;3m2024-09-12 15:06:04,736 [0m[92msats.satellite.EO-2 [0m[mINFO [0m[33m<0.00> [0m[92mEO-2: [0m[mtarget index 9 tasked[0m


[90;3m2024-09-12 15:06:04,736 [0m[92msats.satellite.EO-2 [0m[mINFO [0m[33m<0.00> [0m[92mEO-2: [0m[mTarget(tgt-937) tasked for imaging[0m


[90;3m2024-09-12 15:06:04,737 [0m[92msats.satellite.EO-2 [0m[mINFO [0m[33m<0.00> [0m[92mEO-2: [0m[mTarget(tgt-937) window enabled: 357.5 to 473.6[0m


[90;3m2024-09-12 15:06:04,737 [0m[92msats.satellite.EO-2 [0m[mINFO [0m[33m<0.00> [0m[92mEO-2: [0m[msetting timed terminal event at 473.6[0m


[90;3m2024-09-12 15:06:04,737 [0m[34msats.satellite.EO-3 [0m[mINFO [0m[33m<0.00> [0m[34mEO-3: [0m[mtarget index 8 tasked[0m


[90;3m2024-09-12 15:06:04,738 [0m[34msats.satellite.EO-3 [0m[mINFO [0m[33m<0.00> [0m[34mEO-3: [0m[mTarget(tgt-926) tasked for imaging[0m


[90;3m2024-09-12 15:06:04,738 [0m[34msats.satellite.EO-3 [0m[mINFO [0m[33m<0.00> [0m[34mEO-3: [0m[mTarget(tgt-926) window enabled: 353.2 to 451.1[0m


[90;3m2024-09-12 15:06:04,738 [0m[34msats.satellite.EO-3 [0m[mINFO [0m[33m<0.00> [0m[34mEO-3: [0m[msetting timed terminal event at 451.1[0m


[90;3m2024-09-12 15:06:04,794 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<291.00> [0m[36mEO-1: [0m[mimaged Target(tgt-539)[0m


[90;3m2024-09-12 15:06:04,796 [0m[mdata.base [0m[mINFO [0m[33m<291.00> [0m[mData reward: {'EO-1': 0.662026254515778, 'EO-2': 0.0, 'EO-3': 0.0}[0m


[90;3m2024-09-12 15:06:04,799 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<291.00> [0m[36mEO-1: [0m[mSatellite EO-1 requires retasking[0m


[90;3m2024-09-12 15:06:04,800 [0m[36msats.satellite.EO-1 [0m[mINFO [0m[33m<291.00> [0m[36mEO-1: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-09-12 15:06:04,824 [0m[34msats.satellite.EO-3 [0m[mINFO [0m[33m<291.00> [0m[34mEO-3: [0m[mFinding opportunity windows from 600.00 to 1200.00 seconds[0m


[90;3m2024-09-12 15:06:04,850 [0m[mgym [0m[mINFO [0m[33m<291.00> [0m[mStep reward: {'EO-1': 0.662026254515778, 'EO-2': 0.0, 'EO-3': 0.0}[0m


[90;3m2024-09-12 15:06:04,850 [0m[mgym [0m[mINFO [0m[33m<291.00> [0m[mEpisode terminated: {'EO-1': False, 'EO-2': False, 'EO-3': False}[0m


[90;3m2024-09-12 15:06:04,850 [0m[mgym [0m[mINFO [0m[33m<291.00> [0m[mEpisode truncated: {'EO-1': False, 'EO-2': False, 'EO-3': False}[0m


In [14]:
observation

{'EO-1': array([ 3.54079768e-01, 1.43722174e-04, 3.41016950e-01, -1.19984827e-02,
 4.59862409e-01, -3.73977166e-04, 1.57034165e-01, 4.09939092e-02,
 4.77186531e-01, 1.85235817e-02, 7.89599807e-01, 5.51169539e-02,
 4.91062428e-01, 6.76157713e-02, 4.91219311e-01, 5.56755562e-02,
 6.77410361e-01, 7.07679283e-02, 2.94042355e-01, 1.12199602e-01]),
 'EO-2': array([ 9.14702578e-01, -2.65200788e-02, 4.95920248e-01, -2.44466556e-02,
 3.61190392e-01, -5.08222404e-03, 7.70723330e-01, -2.27018729e-02,
 3.24277884e-01, 5.20187817e-04, 6.73082171e-01, 1.16583843e-02,
 9.45554102e-01, 1.19332716e-02, 3.68596535e-01, 5.59413877e-03,
 2.98544255e-01, 3.60677170e-02, 3.48486992e-01, 3.10203217e-02]),
 'EO-3': array([ 0.52332492, -0.01215394, 0.98737445, 0.01091435, 0.4451887 ,
 0.00316024, 0.0034789 , 0.0223582 , 0.76490943, 0.01442445,
 0.51863226, 0.02706901, 0.92739003, 0.0306299 , 0.83661002,
 0.04446559, 0.73678664, 0.03750987, 0.85066234, 0.11141505])}

Other than compatibility with MARL algorithms, the main benefit of the PettingZoo API
is that it allows for individual agents to fail without terminating the entire environment.

In [15]:
# Immediately kill satellite 0
env.unwrapped.satellites[0].is_alive = isnt_alive
env.agents

['EO-2', 'EO-3']

In [16]:
observation, reward, terminated, truncated, info = env.step({
 env.agents[0]: 7,
 env.agents[1]: 9,
 }
)

[90;3m2024-09-12 15:06:04,861 [0m[mgym [0m[mINFO [0m[33m<291.00> [0m[93;1m=== STARTING STEP ===[0m


[90;3m2024-09-12 15:06:04,862 [0m[92msats.satellite.EO-2 [0m[mINFO [0m[33m<291.00> [0m[92mEO-2: [0m[mtarget index 7 tasked[0m


[90;3m2024-09-12 15:06:04,862 [0m[92msats.satellite.EO-2 [0m[mINFO [0m[33m<291.00> [0m[92mEO-2: [0m[mTarget(tgt-835) tasked for imaging[0m


[90;3m2024-09-12 15:06:04,863 [0m[92msats.satellite.EO-2 [0m[mINFO [0m[33m<291.00> [0m[92mEO-2: [0m[mTarget(tgt-835) window enabled: 322.9 to 532.5[0m


[90;3m2024-09-12 15:06:04,863 [0m[92msats.satellite.EO-2 [0m[mINFO [0m[33m<291.00> [0m[92mEO-2: [0m[msetting timed terminal event at 532.5[0m


[90;3m2024-09-12 15:06:04,863 [0m[34msats.satellite.EO-3 [0m[mINFO [0m[33m<291.00> [0m[34mEO-3: [0m[mtarget index 9 tasked[0m


[90;3m2024-09-12 15:06:04,863 [0m[34msats.satellite.EO-3 [0m[mINFO [0m[33m<291.00> [0m[34mEO-3: [0m[mTarget(tgt-686) tasked for imaging[0m


[90;3m2024-09-12 15:06:04,864 [0m[34msats.satellite.EO-3 [0m[mINFO [0m[33m<291.00> [0m[34mEO-3: [0m[mTarget(tgt-686) window enabled: 926.1 to 1098.0[0m


[90;3m2024-09-12 15:06:04,864 [0m[34msats.satellite.EO-3 [0m[mINFO [0m[33m<291.00> [0m[34mEO-3: [0m[msetting timed terminal event at 1098.0[0m


[90;3m2024-09-12 15:06:04,873 [0m[92msats.satellite.EO-2 [0m[mINFO [0m[33m<332.00> [0m[92mEO-2: [0m[mimaged Target(tgt-835)[0m


[90;3m2024-09-12 15:06:04,875 [0m[mdata.base [0m[mINFO [0m[33m<332.00> [0m[mData reward: {'EO-1': 0.0, 'EO-2': 0.36859653472551324, 'EO-3': 0.0}[0m


[90;3m2024-09-12 15:06:04,876 [0m[92msats.satellite.EO-2 [0m[mINFO [0m[33m<332.00> [0m[92mEO-2: [0m[mSatellite EO-2 requires retasking[0m


[90;3m2024-09-12 15:06:04,881 [0m[mgym [0m[mINFO [0m[33m<332.00> [0m[mStep reward: {'EO-2': 0.36859653472551324, 'EO-3': 0.0}[0m


[90;3m2024-09-12 15:06:04,881 [0m[mgym [0m[mINFO [0m[33m<332.00> [0m[mEpisode terminated: {'EO-2': False, 'EO-3': False}[0m


[90;3m2024-09-12 15:06:04,881 [0m[mgym [0m[mINFO [0m[33m<332.00> [0m[mEpisode truncated: {'EO-2': False, 'EO-3': False}[0m
