API Reference
bsk_rl
is a framework for creating satellite tasking reinforcement learning environments.
Three base environment classes are provided for configuring new environments:
Environment |
API |
Agent Count |
Purpose |
Gymnasium |
1 |
Single-agent training; compatible with most RL libraries. |
|
Gymnasium |
≥1 |
Multi-agent testing; actions and observations are given in tuples. |
|
PettingZoo |
≥1 |
Multi-agent training; compatible with multiagent RL libraries. |
Environments are customized by passing keyword arguments to the environment constructor.
When using gym.make
, the syntax looks like this:
env = gym.make(
"SatelliteTasking-v1",
satellite=Satellite(...),
scenario=UniformTargets(...),
...
)
In some cases (e.g. the multiprocessed Gymnasium vector environment), it is necessary for compatibility to instead register a new environment using the GeneralSatelliteTasking class and a kwargs dict.
See the Examples for more information on environment configuration arguments.
- class GeneralSatelliteTasking(satellites: Satellite | list[Satellite], scenario: Scenario | None = None, rewarder: GlobalReward | None = None, world_type: type[WorldModel] | None = None, world_args: dict[str, Any] | None = None, communicator: CommunicationMethod | None = None, sat_arg_randomizer: Callable[[list[Satellite]], dict[Satellite, dict[str, Any]]] | None = None, sim_rate: float = 1.0, max_step_duration: float = 1000000000.0, failure_penalty: float = -1.0, time_limit: float = inf, terminate_on_time_limit: bool = False, generate_obs_retasking_only: bool = False, log_level: int | str = 30, log_dir: str | None = None, render_mode=None)[source]
Bases:
Env
,Generic
[SatObs
,SatAct
]A Gymnasium environment adaptable to a wide range satellite tasking problems.
These problems involve satellite(s) being tasked to complete tasks and maintain aliveness. These tasks often include rewards for data collection. The environment can be configured for any collection of satellites, including heterogenous constellations. Other configurable aspects are the scenario (e.g. imaging targets), data collection and recording, and intersatellite communication of data.
The state space is a tuple containing the state of each satellite. Actions are assigned as a tuple of actions, one per satellite.
- Parameters:
satellites (Satellite | list[Satellite]) – Satellite(s) to be simulated. See Satellites.
scenario (Scenario | None) – Environment the satellite is acting in; contains information about targets, etc. See Scenario.
rewarder (GlobalReward | None) – Handles recording and rewarding for data collection towards objectives. See Data & Reward.
communicator (CommunicationMethod | None) – Manages communication between satellites. See Communication.
sat_arg_randomizer (Callable[[list[Satellite]], dict[Satellite, dict[str, Any]]] | None) – For correlated randomization of satellites arguments. Should be a function that takes a list of satellites and returns a dictionary that maps satellites to dictionaries of satellite model arguments to be overridden.
world_type (type[WorldModel] | None) – Type of Basilisk world model to be constructed.
world_args (dict[str, Any] | None) – Arguments for
WorldModel
construction. Should be in the form of a dictionary with keys corresponding to the arguments of the constructor and values that are either the desired value or a function that takes no arguments and returns a randomized value.sim_rate (float) – [s] Rate for model simulation.
max_step_duration (float) – [s] Maximum time to propagate sim at a step. If satellites are using variable interval actions, the actual step duration will be less than or equal to this value.
failure_penalty (float) – Reward for satellite failure. Should be nonpositive.
time_limit (float) – [s] Time at which to truncate the simulation.
terminate_on_time_limit (bool) – Send terminations signal time_limit instead of just truncation.
generate_obs_retasking_only (bool) – If True, only generate observations for satellites that require retasking. All other satellites will receive an observation of zeros.
log_level (int | str) – Logging level for the environment. Default is
WARNING
.log_dir (str | None) – Directory to write logs to in addition to the console.
render_mode – Unused.
- reset(seed: int | None = None, options=None) tuple[tuple[SatObs, ...], dict[str, Any]] [source]
Reconstruct the simulator and reset the scenario.
Satellite and world arguments get randomized on reset, if
GeneralSatelliteTasking
.world_args
orSatellite
.sat_args
includes randomization functions.Certain classes in
bsk_rl
have areset_pre_sim_init
and/orreset_post_sim_init
method. These methods are respectively called before and after the new BasiliskSimulator
is created. These allow for reset actions that feed into the underlying simulation and those that are dependent on the underlying simulation to be performed.- Parameters:
seed (int | None) – Gymnasium environment seed.
options – Unused.
- Returns:
observation, info
- Return type:
tuple[tuple[SatObs, …], dict[str, Any]]
- delete_simulator()[source]
Delete Basilisk objects.
Only the simulator contains strong references to BSK models, so deleting it will delete all Basilisk objects. Enable debug-level logging to verify that the simulator, FSW, dynamics, and world models are all deleted on reset.
- property action_space: Space[Iterable[SatAct]]
Compose satellite action spaces into a tuple.
- Returns:
Joint action space
- property observation_space: Space[tuple[SatObs, ...]]
Compose satellite observation spaces into a tuple.
Note: calls
reset()
, which can be expensive, to determine observation size.- Returns:
Joint observation space
- step(actions: Iterable[SatAct]) tuple[tuple[SatObs, ...], float, bool, bool, dict[str, Any]] [source]
Propagate the simulation, update information, and get rewards.
- Parameters:
actions (Iterable[SatAct]) – Joint action for satellites
- Returns:
observation, reward, terminated, truncated, info
- Return type:
tuple[tuple[SatObs, …], float, bool, bool, dict[str, Any]]
- class SatelliteTasking(satellite: Satellite, *args, **kwargs)[source]
Bases:
GeneralSatelliteTasking
,Generic
[SatObs
,SatAct
]A special case of
GeneralSatelliteTasking
for one satellite.For compatibility with standard training APIs, actions and observations are directly exposed for the single satellite as opposed to being wrapped in a tuple.
- Parameters:
satellite (Satellite) – Satellite to be simulated.
*args – Passed to
GeneralSatelliteTasking
.**kwargs – Passed to
GeneralSatelliteTasking
.
- property action_space: Space[SatAct]
Return the single satellite action space.
- property observation_space: Box
Return the single satellite observation space.
- class ConstellationTasking(*args, **kwargs)[source]
Bases:
GeneralSatelliteTasking
,ParallelEnv
,Generic
[SatObs
,SatAct
,AgentID
]Implements the PettingZoo parallel API for the
GeneralSatelliteTasking
environment.- Parameters:
*args – Passed to
GeneralSatelliteTasking
.**kwargs – Passed to
GeneralSatelliteTasking
.
- reset(seed: int | None = None, options=None) tuple[tuple[SatObs, ...], dict[str, Any]] [source]
Reset the environment and return PettingZoo Parallel API format.
- Parameters:
seed (int | None)
- Return type:
tuple[tuple[SatObs, …], dict[str, Any]]
- property agents: list[AgentID]
Agents currently in the environment.
- property num_agents: int
Number of agents currently in the environment.
- property possible_agents: list[AgentID]
Return the list of all possible agents.
- property max_num_agents: int
Maximum number of agents possible in the environment.
- property previously_dead: list[AgentID]
Return the list of agents that died at least one step ago.
- property observation_spaces: dict[AgentID, Box]
Return the observation space for each agent.
- observation_space(agent: AgentID) Space[SatObs] [source]
Return the observation space for a certain agent.
- Parameters:
agent (AgentID)
- Return type:
Space[SatObs]
- property action_spaces: dict[AgentID, Space[SatAct]]
Return the action space for each agent.
- action_space(agent: AgentID) Space[SatAct] [source]
Return the action space for a certain agent.
- Parameters:
agent (AgentID)
- Return type:
Space[SatAct]
- step(actions: dict[AgentID, SatAct]) tuple[dict[AgentID, SatObs], dict[AgentID, float], dict[AgentID, bool], dict[AgentID, bool], dict[AgentID, dict]] [source]
Step the environment and return PettingZoo Parallel API format.
- Parameters:
actions (dict[AgentID, SatAct])
- Return type:
tuple[dict[AgentID, SatObs], dict[AgentID, float], dict[AgentID, bool], dict[AgentID, bool], dict[AgentID, dict]]