API Reference

bsk_rl is a framework for creating satellite tasking reinforcement learning environments. Three base environment classes are provided for configuring new environments:

Environment

API

Agent Count

Purpose

SatelliteTasking

Gymnasium

1

Single-agent training; compatible with most RL libraries.

GeneralSatelliteTasking

Gymnasium

≥1

Multi-agent testing; actions and observations are given in tuples.

ConstellationTasking

PettingZoo

≥1

Multi-agent training; compatible with multiagent RL libraries.

Environments are customized by passing keyword arguments to the environment constructor. When using gym.make, the syntax looks like this:

env = gym.make(
    "SatelliteTasking-v1",
    satellite=Satellite(...),
    scenario=UniformTargets(...),
    ...
)

In some cases (e.g. the multiprocessed Gymnasium vector environment), it is necessary for compatibility to instead register a new environment using the GeneralSatelliteTasking class and a kwargs dict.

See the Examples for more information on environment configuration arguments.

class GeneralSatelliteTasking(satellites: Satellite | list[Satellite], scenario: Scenario | None = None, rewarder: GlobalReward | None = None, world_type: type[WorldModel] | None = None, world_args: dict[str, Any] | None = None, communicator: CommunicationMethod | None = None, sat_arg_randomizer: Callable[[list[Satellite]], dict[Satellite, dict[str, Any]]] | None = None, sim_rate: float = 1.0, max_step_duration: float = 1000000000.0, failure_penalty: float = -1.0, time_limit: float = inf, terminate_on_time_limit: bool = False, generate_obs_retasking_only: bool = False, log_level: int | str = 30, log_dir: str | None = None, render_mode=None)[source]

Bases: Env, Generic[SatObs, SatAct]

A Gymnasium environment adaptable to a wide range satellite tasking problems.

These problems involve satellite(s) being tasked to complete tasks and maintain aliveness. These tasks often include rewards for data collection. The environment can be configured for any collection of satellites, including heterogenous constellations. Other configurable aspects are the scenario (e.g. imaging targets), data collection and recording, and intersatellite communication of data.

The state space is a tuple containing the state of each satellite. Actions are assigned as a tuple of actions, one per satellite.

Parameters:
  • satellites (Satellite | list[Satellite]) – Satellite(s) to be simulated. See Satellites.

  • scenario (Scenario | None) – Environment the satellite is acting in; contains information about targets, etc. See Scenario.

  • rewarder (GlobalReward | None) – Handles recording and rewarding for data collection towards objectives. See Data & Reward.

  • communicator (CommunicationMethod | None) – Manages communication between satellites. See Communication.

  • sat_arg_randomizer (Callable[[list[Satellite]], dict[Satellite, dict[str, Any]]] | None) – For correlated randomization of satellites arguments. Should be a function that takes a list of satellites and returns a dictionary that maps satellites to dictionaries of satellite model arguments to be overridden.

  • world_type (type[WorldModel] | None) – Type of Basilisk world model to be constructed.

  • world_args (dict[str, Any] | None) – Arguments for WorldModel construction. Should be in the form of a dictionary with keys corresponding to the arguments of the constructor and values that are either the desired value or a function that takes no arguments and returns a randomized value.

  • sim_rate (float) – [s] Rate for model simulation.

  • max_step_duration (float) – [s] Maximum time to propagate sim at a step. If satellites are using variable interval actions, the actual step duration will be less than or equal to this value.

  • failure_penalty (float) – Reward for satellite failure. Should be nonpositive.

  • time_limit (float) – [s] Time at which to truncate the simulation.

  • terminate_on_time_limit (bool) – Send terminations signal time_limit instead of just truncation.

  • generate_obs_retasking_only (bool) – If True, only generate observations for satellites that require retasking. All other satellites will receive an observation of zeros.

  • log_level (int | str) – Logging level for the environment. Default is WARNING.

  • log_dir (str | None) – Directory to write logs to in addition to the console.

  • render_mode – Unused.

reset(seed: int | None = None, options=None) tuple[tuple[SatObs, ...], dict[str, Any]][source]

Reconstruct the simulator and reset the scenario.

Satellite and world arguments get randomized on reset, if GeneralSatelliteTasking .world_args or Satellite .sat_args includes randomization functions.

Certain classes in bsk_rl have a reset_pre_sim_init and/or reset_post_sim_init method. These methods are respectively called before and after the new Basilisk Simulator is created. These allow for reset actions that feed into the underlying simulation and those that are dependent on the underlying simulation to be performed.

Parameters:
  • seed (int | None) – Gymnasium environment seed.

  • options – Unused.

Returns:

observation, info

Return type:

tuple[tuple[SatObs, …], dict[str, Any]]

delete_simulator()[source]

Delete Basilisk objects.

Only the simulator contains strong references to BSK models, so deleting it will delete all Basilisk objects. Enable debug-level logging to verify that the simulator, FSW, dynamics, and world models are all deleted on reset.

property action_space: Space[Iterable[SatAct]]

Compose satellite action spaces into a tuple.

Returns:

Joint action space

property observation_space: Space[tuple[SatObs, ...]]

Compose satellite observation spaces into a tuple.

Note: calls reset(), which can be expensive, to determine observation size.

Returns:

Joint observation space

step(actions: Iterable[SatAct]) tuple[tuple[SatObs, ...], float, bool, bool, dict[str, Any]][source]

Propagate the simulation, update information, and get rewards.

Parameters:

actions (Iterable[SatAct]) – Joint action for satellites

Returns:

observation, reward, terminated, truncated, info

Return type:

tuple[tuple[SatObs, …], float, bool, bool, dict[str, Any]]

render() None[source]

No rendering implemented.

Return type:

None

close() None[source]

Try to cleanly delete everything.

Return type:

None

class SatelliteTasking(satellite: Satellite, *args, **kwargs)[source]

Bases: GeneralSatelliteTasking, Generic[SatObs, SatAct]

A special case of GeneralSatelliteTasking for one satellite.

For compatibility with standard training APIs, actions and observations are directly exposed for the single satellite as opposed to being wrapped in a tuple.

Parameters:
property action_space: Space[SatAct]

Return the single satellite action space.

property observation_space: Box

Return the single satellite observation space.

property satellite: Satellite

Satellite being tasked.

step(action) tuple[Any, float, bool, bool, dict[str, Any]][source]

Task the satellite with a single action.

Return type:

tuple[Any, float, bool, bool, dict[str, Any]]

class ConstellationTasking(*args, **kwargs)[source]

Bases: GeneralSatelliteTasking, ParallelEnv, Generic[SatObs, SatAct, AgentID]

Implements the PettingZoo parallel API for the GeneralSatelliteTasking environment.

Parameters:
reset(seed: int | None = None, options=None) tuple[tuple[SatObs, ...], dict[str, Any]][source]

Reset the environment and return PettingZoo Parallel API format.

Parameters:

seed (int | None)

Return type:

tuple[tuple[SatObs, …], dict[str, Any]]

property agents: list[AgentID]

Agents currently in the environment.

property num_agents: int

Number of agents currently in the environment.

property possible_agents: list[AgentID]

Return the list of all possible agents.

property max_num_agents: int

Maximum number of agents possible in the environment.

property previously_dead: list[AgentID]

Return the list of agents that died at least one step ago.

property observation_spaces: dict[AgentID, Box]

Return the observation space for each agent.

observation_space(agent: AgentID) Space[SatObs][source]

Return the observation space for a certain agent.

Parameters:

agent (AgentID)

Return type:

Space[SatObs]

property action_spaces: dict[AgentID, Space[SatAct]]

Return the action space for each agent.

action_space(agent: AgentID) Space[SatAct][source]

Return the action space for a certain agent.

Parameters:

agent (AgentID)

Return type:

Space[SatAct]

step(actions: dict[AgentID, SatAct]) tuple[dict[AgentID, SatObs], dict[AgentID, float], dict[AgentID, bool], dict[AgentID, bool], dict[AgentID, dict]][source]

Step the environment and return PettingZoo Parallel API format.

Parameters:

actions (dict[AgentID, SatAct])

Return type:

tuple[dict[AgentID, SatObs], dict[AgentID, float], dict[AgentID, bool], dict[AgentID, bool], dict[AgentID, dict]]