API Reference

bsk_rl is a framework for creating satellite tasking reinforcement learning environments. Three base environment classes are provided for configuring new environments:

Environment	API	Agent Count	Purpose
`SatelliteTasking`	Gymnasium	1	Single-agent training; compatible with most RL libraries.
`GeneralSatelliteTasking`	Gymnasium	≥1	Multi-agent testing; actions and observations are given in tuples.
`ConstellationTasking`	PettingZoo	≥1	Multi-agent training; compatible with multiagent RL libraries.

Environments are customized by passing keyword arguments to the environment constructor. When using gym.make, the syntax looks like this:

env = gym.make(
    "SatelliteTasking-v1",
    satellite=Satellite(...),
    scenario=UniformTargets(...),
    ...
)

In some cases (e.g. the multiprocessed Gymnasium vector environment), it is necessary for compatibility to instead register a new environment using the GeneralSatelliteTasking class and a kwargs dict.

See the Examples for more information on environment configuration arguments.

class GeneralSatelliteTasking(satellites: Satellite | list[Satellite], scenario: Scenario | None = None, rewarder: GlobalReward | list[GlobalReward] | None = None, world_type: type[WorldModel] | None = None, world_args: dict[str, Any] | None = None, communicator: CommunicationMethod | None = None, sat_arg_randomizer: Callable[[list[Satellite]], dict[Satellite, dict[str, Any]]] | None = None, sim_rate: float = 1.0, max_step_duration: float = 1000000000.0, failure_penalty: float = -1.0, time_limit: float | Callable = inf, terminate_on_time_limit: bool = False, generate_obs_retasking_only: bool = False, dtype: dtype | None = None, log_level: int | str = 30, log_dir: str | None = None, vizard_dir: str | None = None, vizard_settings: dict[str, Any] | None = None, render_mode=None)[source]

Bases: Env, Generic[SatObs, SatAct]

A Gymnasium environment adaptable to a wide range satellite tasking problems.

These problems involve satellite(s) being tasked to complete tasks and maintain aliveness. These tasks often include rewards for data collection. The environment can be configured for any collection of satellites, including heterogenous constellations. Other configurable aspects are the scenario (e.g. imaging targets), data collection and recording, and intersatellite communication of data.

The state space is a tuple containing the state of each satellite. Actions are assigned as a tuple of actions, one per satellite.

Parameters:

satellites (Satellite | list[Satellite]) – Satellite(s) to be simulated. See Satellites.
scenario (Scenario | None) – Environment the satellite is acting in; contains information about targets, etc. See Scenario.
rewarder (GlobalReward | list[GlobalReward] | None) – Handles recording and rewarding for data collection towards objectives. Can be a single rewarder or a tuple of multiple rewarders. See Data & Reward.
communicator (CommunicationMethod | None) – Manages communication between satellites. See Communication.
sat_arg_randomizer (Callable[[list[Satellite]], dict[Satellite, dict[str, Any]]] | None) – For correlated randomization of satellites arguments. Should be a function that takes a list of satellites and returns a dictionary that maps satellites to dictionaries of satellite model arguments to be overridden.
world_type (type[WorldModel] | None) – Type of Basilisk world model to be constructed.
world_args (dict[str, Any] | None) – Arguments for WorldModel construction. Should be in the form of a dictionary with keys corresponding to the arguments of the constructor and values that are either the desired value or a function that takes no arguments and returns a randomized value.
sim_rate (float) – [s] Rate for model simulation.
max_step_duration (float) – [s] Maximum time to propagate sim at a step. If satellites are using variable interval actions, the actual step duration will be less than or equal to this value. It is preferable to set durations in the actions themselves.
failure_penalty (float) – Reward for satellite failure. Should be nonpositive.
time_limit (float | Callable) – [s] Time at which to truncate the simulation. Can also be a function that takes no arguments and returns a float. This function will be called every time the environment is reset to randomize the time limit.
terminate_on_time_limit (bool) – Send terminations signal time_limit instead of just truncation.
generate_obs_retasking_only (bool) – If True, only generate observations for satellites that require retasking. All other satellites will receive an observation of zeros.
dtype (dtype | None) – Data type for satellite observations. If None, the data type specified in the satellite.
log_level (int | str) – Logging level for the environment. Default is WARNING.
log_dir (str | None) – Directory to write logs to in addition to the console.
vizard_dir (str | None) – Path to save Vizard visualization files. If None, no Vizard-related modules will be imported.
vizard_settings (dict[str, Any] | None) – Settings for Vizard visualization. Set in vizIstance.settings. Additionally, the key vizard_rate can be set to the rate at which Vizard updates. Valid setting can be found here.
render_mode – Unused.

get_satellite(name: str) → Satellite[source]

Get a satellite by name.

Parameters:: name (str) – Name of the satellite to retrieve.
Returns:: The satellite object with the specified name.
Return type:: Satellite

reset(seed: int | None = None, options=None) → tuple[tuple[SatObs, ...], dict[str, Any]][source]

Reconstruct the simulator and reset the scenario.

Satellite and world arguments get randomized on reset, if GeneralSatelliteTasking .world_args or Satellite .sat_args includes randomization functions.

Certain classes in bsk_rl have a reset_pre_sim_init and/or reset_post_sim_init method. These methods are respectively called before and after the new Basilisk Simulator is created. These allow for reset actions that feed into the underlying simulation and those that are dependent on the underlying simulation to be performed.

Parameters:

seed (int | None) – Gymnasium environment seed.
options – Unused.

Returns:

observation, info

Return type:

tuple[tuple[SatObs, …], dict[str, Any]]

delete_simulator()[source]

Delete Basilisk objects.

Only the simulator contains strong references to BSK models, so deleting it will delete all Basilisk objects. Enable debug-level logging to verify that the simulator, FSW, dynamics, and world models are all deleted on reset.

property action_space: Space[Iterable[SatAct]]

Compose satellite action spaces into a tuple.

Returns:: Joint action space

property observation_space: Space[tuple[SatObs, ...]]

Compose satellite observation spaces into a tuple.

Note: calls reset(), which can be expensive, to determine observation size.

Returns:: Joint observation space

step(actions: Iterable[SatAct]) → tuple[tuple[SatObs, ...], float, bool, bool, dict[str, Any]][source]

Propagate the simulation, update information, and get rewards.

Parameters:: actions (Iterable[SatAct]) – Joint action for satellites
Returns:: observation, reward, terminated, truncated, info
Return type:: tuple[tuple[SatObs, …], float, bool, bool, dict[str, Any]]

render() → None[source]

No rendering implemented.

Return type:: None

close() → None[source]

Try to cleanly delete everything.

Return type:: None

class SatelliteTasking(satellite: Satellite, *args, **kwargs)[source]

Bases: GeneralSatelliteTasking, Generic[SatObs, SatAct]

A special case of GeneralSatelliteTasking for one satellite.

For compatibility with standard training APIs, actions and observations are directly exposed for the single satellite as opposed to being wrapped in a tuple.

Parameters:

satellite (Satellite) – Satellite to be simulated.
*args – Passed to GeneralSatelliteTasking.
**kwargs – Passed to GeneralSatelliteTasking.

property action_space: Space[SatAct]: Return the single satellite action space.

property observation_space: Box: Return the single satellite observation space.

property satellite: Satellite: Satellite being tasked.

step(action) → tuple[Any, float, bool, bool, dict[str, Any]][source]

Task the satellite with a single action.

Return type:: tuple[Any, float, bool, bool, dict[str, Any]]

class ConstellationTasking(*args, meta_agent_groupings: dict[AgentID, list[str]] | None = None, only_retask_idle_meta_agent_members: bool = False, **kwargs)[source]

Bases: GeneralSatelliteTasking, ParallelEnv, Generic[SatObs, SatAct, AgentID]

Implements the PettingZoo parallel API for the GeneralSatelliteTasking environment.

Parameters:

*args – Passed to GeneralSatelliteTasking.
meta_agent_groupings (dict[AgentID, list[str]] | None) – A dictionary mapping agent names to lists of satellite names.
only_retask_idle_meta_agent_members (bool) – If True, only satellites in a meta agent that require retasking will receive actions. Other actions in the meta agent output will be ignored. This may also be useful to control in the training pipeline.
**kwargs – Passed to GeneralSatelliteTasking.

reset(seed: int | None = None, options=None) → tuple[tuple[SatObs, ...], dict[str, Any]][source]

Reset the environment and return PettingZoo Parallel API format.

Parameters:: seed (int | None)
Return type:: tuple[tuple[SatObs, …], dict[str, Any]]

property agents: list[AgentID]: Agents currently in the environment.

property num_agents: int: Number of agents currently in the environment.

property possible_agents: list[AgentID]: Return the list of all possible agents.

property max_num_agents: int: Maximum number of agents possible in the environment.

property previously_dead: list[AgentID]: Return the list of agents that died at least one step ago.

property observation_spaces: dict[AgentID, Box]: Return the observation space for each agent.

observation_space(agent: AgentID) → Space[SatObs][source]

Return the observation space for a certain agent.

Parameters:: agent (AgentID)
Return type:: Space[SatObs]

property action_spaces: dict[AgentID, Space[SatAct]]: Return the action space for each agent.

action_space(agent: AgentID) → Space[SatAct][source]

Return the action space for a certain agent.

Parameters:: agent (AgentID)
Return type:: Space[SatAct]

step(actions: dict[AgentID, SatAct]) → tuple[dict[AgentID, SatObs], dict[AgentID, float], dict[AgentID, bool], dict[AgentID, bool], dict[AgentID, dict]][source]

Step the environment and return PettingZoo Parallel API format.

Parameters:: actions (dict[AgentID, SatAct])
Return type:: tuple[dict[AgentID, SatObs], dict[AgentID, float], dict[AgentID, bool], dict[AgentID, bool], dict[AgentID, dict]]