# Multi-Agent Environments

Two multiagent environments are given in the package:

* [GeneralSatelliteTasking](../api_reference/index.rst#bsk_rl.GeneralSatelliteTasking), 
 a [Gymnasium](https://gymnasium.farama.org)-based environment and the basis for all other environments.
* [ConstellationTasking](../api_reference/index.rst#bsk_rl.ConstellationTasking), which
 implements the [PettingZoo parallel API](https://pettingzoo.farama.org/api/parallel/).

The latter is preferable for multi-agent RL (MARL) settings, as most algorithms are designed
for this kind of API.

## Configuring the Environment

For this example, a multisatellite target imaging environment will be used. The goal is
to maximize the value of unique images taken.

As usual, the satellite type is defined first.

In [None]:
from bsk_rl import sats, act, obs, scene, data, comm
from bsk_rl.sim import dyn, fsw

class ImagingSatellite(sats.ImagingSatellite):
 observation_spec = [
 obs.OpportunityProperties(
 dict(prop="priority"), 
 dict(prop="opportunity_open", norm=5700.0),
 n_ahead_observe=10,
 )
 ]
 action_spec = [act.Image(n_ahead_image=10)]
 dyn_type = dyn.FullFeaturedDynModel
 fsw_type = fsw.SteeringImagerFSWModel

Satellite properties are set to give the satellite near-unlimited power and storage resources. To randomize some parameters in a correlated manner across satellites, a ``sat_arg_randomizer`` is set and passed to the environment. In this case, the satellites are distributed in a trivial single-plane Walker-delta constellation.

In [None]:

from bsk_rl.utils.orbital import walker_delta_args

sat_args = dict(
 imageAttErrorRequirement=0.01,
 imageRateErrorRequirement=0.01,
 batteryStorageCapacity=1e9,
 storedCharge_Init=1e9,
 dataStorageCapacity=1e12,
 u_max=0.4,
 K1=0.25,
 K3=3.0,
 omega_max=0.087,
 servo_Ki=5.0,
 servo_P=150 / 5,
)
sat_arg_randomizer = walker_delta_args(altitude=800.0, inc=60.0, n_planes=1)

## Gym API

GeneralSatelliteTasking uses tuples of actions and observations to interact with the
environment.

In [None]:
from bsk_rl import GeneralSatelliteTasking

env = GeneralSatelliteTasking(
 satellites=[
 ImagingSatellite("EO-1", sat_args),
 ImagingSatellite("EO-2", sat_args),
 ImagingSatellite("EO-3", sat_args),
 ],
 scenario=scene.UniformTargets(1000),
 rewarder=data.UniqueImageReward(),
 communicator=comm.LOSCommunication(), # Note that dyn must inherit from LOSCommunication
 sat_arg_randomizer=sat_arg_randomizer,
 log_level="INFO",
)
env.reset()

env.observation_space

In [None]:
env.action_space

Consequently, actions are passed as a tuple. The step will stop the first time any
satellite completes an action.

In [None]:
observation, reward, terminated, truncated, info = env.step([7, 9, 8])

In [None]:
observation

At this point, either every satellite can be retasked, or satellites can continue their
previous action by passing `None` as the action. To see which satellites must be
retasked (i.e. their previous action is done and they have nothing more to do), look at
`"requires_retasking"` in each satellite's info.

In [None]:
info

Based on this list, we decide here to only retask the satellite that needs it.

In [None]:
actions = [0 if info[sat.name]["requires_retasking"] else None for sat in env.unwrapped.satellites]
actions

In [None]:
observation, reward, terminated, truncated, info = env.step(actions)

In this environment, the environment will stop if any agent dies. To demonstrate this,
one satellite is forcibly killed.

In [None]:
from Basilisk.architecture import messaging

def isnt_alive(log_failure=False):
 """Mock satellite 0 dying."""
 self = env.unwrapped.satellites[0]
 death_message = messaging.PowerStorageStatusMsgPayload()
 death_message.storageLevel = 0.0
 self.dynamics.powerMonitor.batPowerOutMsg.write(death_message)
 return self.dynamics.is_alive(log_failure=log_failure) and self.fsw.is_alive(
 log_failure=log_failure
 )

env.unwrapped.satellites[0].is_alive = isnt_alive
observation, reward, terminated, truncated, info = env.step([6, 7, 9])


## PettingZoo API

The [PettingZoo parallel API](https://pettingzoo.farama.org/api/parallel/) environment, 
ConstellationTasking, is largely the same as GeneralSatelliteTasking. See their
documentation for a full description of the API. It tends to separate things into
dictionaries keyed by agent, rather than tuples.

In [None]:
from bsk_rl import ConstellationTasking

env = ConstellationTasking(
 satellites=[
 ImagingSatellite("EO-1", sat_args),
 ImagingSatellite("EO-2", sat_args),
 ImagingSatellite("EO-3", sat_args),
 ],
 scenario=scene.UniformTargets(1000),
 rewarder=data.UniqueImageReward(),
 communicator=comm.LOSCommunication(), # Note that dyn must inherit from LOSCommunication
 sat_arg_randomizer=sat_arg_randomizer,
 log_level="INFO",
)
env.reset()

env.observation_spaces

In [None]:
env.action_spaces

Actions are passed as a dictionary; the agent names can be accessed through the `agents`
property.

In [None]:
observation, reward, terminated, truncated, info = env.step(
 {
 env.agents[0]: 7,
 env.agents[1]: 9,
 env.agents[2]: 8,
 }
)

In [None]:
observation

Other than compatibility with MARL algorithms, the main benefit of the PettingZoo API
is that it allows for individual agents to fail without terminating the entire environment.

In [None]:
# Immediately kill satellite 0
env.unwrapped.satellites[0].is_alive = isnt_alive
env.agents

In [None]:
observation, reward, terminated, truncated, info = env.step({
 env.agents[0]: 7,
 env.agents[1]: 9,
 }
)