Getting Started
This tutorial demonstrates the configuration and use of a simple BSK-RL environment. BSK-RL and dependencies should already be installed at this point (see Installation if you haven’t installed the package yet).
Load Modules
In this tutorial, the environment will be created with gym.make, so it is necessary to import the top-level bsk_rl module as well as gym and bsk_rl components.
[1]:
import gymnasium as gym
import numpy as np
from bsk_rl import act, data, obs, scene, sats
from bsk_rl.sim import dyn, fsw
from Basilisk.architecture import bskLogging
bskLogging.setDefaultLogLevel(bskLogging.BSK_WARNING)
If no errors were raised, you have a functional installation of bsk_rl.
Configure the Satellite
Satellites are configurable agents in the environment. To make a new environment, start by specifying the observations and actions of a satellite type, as well as the underlying Basilisk simulation models used by the satellite.
[2]:
class MyScanningSatellite(sats.AccessSatellite):
observation_spec = [
obs.SatProperties(
dict(prop="storage_level_fraction"),
dict(prop="battery_charge_fraction")
),
obs.Eclipse(),
]
action_spec = [
act.Scan(duration=60.0), # Scan for 1 minute
act.Charge(duration=600.0), # Charge for 10 minutes
]
dyn_type = dyn.ContinuousImagingDynModel
fsw_type = fsw.ContinuousImagingFSWModel
Based on this class specification, a list of configurable parameters for the satellite can be generated.
[3]:
MyScanningSatellite.default_sat_args()
[3]:
{'hs_min': 0.0,
'maxCounterValue': 4,
'thrMinFireTime': 0.02,
'desatAttitude': 'sun',
'controlAxes_B': [1, 0, 0, 0, 1, 0, 0, 0, 1],
'thrForceSign': 1,
'K': 7.0,
'Ki': -1,
'P': 35.0,
'imageAttErrorRequirement': 0.01,
'imageRateErrorRequirement': None,
'inst_pHat_B': [0, 0, 1],
'utc_init': 'this value will be set by the world model',
'batteryStorageCapacity': 288000.0,
'storedCharge_Init': <function bsk_rl.sim.dyn.base.BasicDynamicsModel.<lambda>()>,
'disturbance_vector': None,
'dragCoeff': 2.2,
'panelArea': 1.0,
'imageTargetMaximumRange': -1,
'instrumentBaudRate': 8000000.0,
'instrumentPowerDraw': -30.0,
'basePowerDraw': 0.0,
'wheelSpeeds': <function bsk_rl.sim.dyn.base.BasicDynamicsModel.<lambda>()>,
'maxWheelSpeed': inf,
'u_max': 0.2,
'rwBasePower': 0.4,
'rwMechToElecEfficiency': 0.0,
'rwElecToMechEfficiency': 0.5,
'panelEfficiency': 0.2,
'nHat_B': array([ 0, 0, -1]),
'mass': 330,
'width': 1.38,
'depth': 1.04,
'height': 1.58,
'sigma_init': <function bsk_rl.sim.dyn.base.DynamicsModel.<lambda>()>,
'omega_init': <function bsk_rl.sim.dyn.base.DynamicsModel.<lambda>()>,
'rN': None,
'vN': None,
'oe': <function bsk_rl.utils.orbital.random_orbit(i: Optional[float] = None, a: Optional[float] = 6871, e: float = 0, Omega: Optional[float] = None, omega: Optional[float] = None, f: Optional[float] = None, alt: float = None, r_body: float = 6371) -> Basilisk.utilities.orbitalMotion.ClassicElements>,
'mu': 398600436000000.0,
'min_orbital_radius': 6578136.6,
'dataStorageCapacity': 160000000.0,
'storageUnitValidCheck': False,
'storageInit': 0,
'thrusterPowerDraw': 0.0,
'transmitterBaudRate': -8000000.0,
'transmitterNumBuffers': 100,
'transmitterPacketSize': None,
'transmitterPowerDraw': -15.0}
When instantiating a satellite, these parameters can be overriden with a constant or rerandomized every time the environment is reset using the sat_args dictionary.
[4]:
sat_args = {}
# Set some parameters as constants
sat_args["imageAttErrorRequirement"] = 0.05
sat_args["dataStorageCapacity"] = 1e10
sat_args["instrumentBaudRate"] = 1e7
sat_args["storedCharge_Init"] = 50000.0
# Randomize the initial storage level on every reset
sat_args["storageInit"] = lambda: np.random.uniform(0.25, 0.75) * 1e10
# Make the satellite
sat = MyScanningSatellite(name="EO1", sat_args=sat_args)
Making the Environment
For this example, we will be using the single-agent SatelliteTasking environment. Along with passing the satellite that we configured, the environment takes a scenario, which defines the environment the satellite is acting in, and a rewarder, which defines how data collected from the scenario is rewarded.
[5]:
env = gym.make(
"SatelliteTasking-v1",
satellite=sat,
scenario=scene.UniformNadirScanning(),
rewarder=data.ScanningTimeReward(),
time_limit=5700.0, # approximately 1 orbit
log_level="INFO",
)
2026-05-19 20:30:28,231 gym INFO Calling env.reset() to get observation space
2026-05-19 20:30:28,232 gym INFO Resetting environment with seed=3155928932
2026-05-19 20:30:28,281 sats.satellite.EO1 INFO <0.00> EO1: Finding opportunity windows from 0.00 to 6000.00 seconds
2026-05-19 20:30:28,291 gym INFO <0.00> Environment reset
Interacting with the Environment
First, the environment is reset.
[6]:
observation, info = env.reset(seed=1)
2026-05-19 20:30:28,297 gym INFO Resetting environment with seed=1
2026-05-19 20:30:28,310 sats.satellite.EO1 INFO <0.00> EO1: Finding opportunity windows from 0.00 to 6000.00 seconds
2026-05-19 20:30:28,318 gym INFO <0.00> Environment reset
Next, we take the scan action (action=0) a few times. This allows for the satellite to settle its attitude in the nadir pointing mode to satisfy imaging conditions. Note that the logs show little or no data accumulated in the first two steps as it settles, but achieves 60 reward (corresponding to 60 seconds of imaging) by the third step.
[7]:
print("Initial data level:", observation[0], "(randomized by sat_args)")
for _ in range(3):
observation, reward, terminated, truncated, info = env.step(action=0)
print(" Final data level:", observation[0])
2026-05-19 20:30:28,323 gym INFO <0.00> === STARTING STEP ===
2026-05-19 20:30:28,324 sats.satellite.EO1 INFO <0.00> EO1: action_nadir_scan tasked for 60.0 seconds
2026-05-19 20:30:28,324 sats.satellite.EO1 INFO <0.00> EO1: setting timed terminal event at 60.0
2026-05-19 20:30:28,329 sats.satellite.EO1 INFO <60.00> EO1: timed termination at 60.0 for action_nadir_scan
2026-05-19 20:30:28,329 data.base INFO <60.00> Total reward: {}
2026-05-19 20:30:28,330 comm.communication INFO <60.00> Optimizing data communication between all pairs of satellites
2026-05-19 20:30:28,330 sats.satellite.EO1 INFO <60.00> EO1: Satellite EO1 requires retasking
2026-05-19 20:30:28,332 gym INFO <60.00> Step reward: 0.0
2026-05-19 20:30:28,332 gym INFO <60.00> === STARTING STEP ===
2026-05-19 20:30:28,332 sats.satellite.EO1 INFO <60.00> EO1: action_nadir_scan tasked for 60.0 seconds
2026-05-19 20:30:28,333 sats.satellite.EO1 INFO <60.00> EO1: setting timed terminal event at 120.0
2026-05-19 20:30:28,337 sats.satellite.EO1 INFO <120.00> EO1: timed termination at 120.0 for action_nadir_scan
2026-05-19 20:30:28,338 data.base INFO <120.00> Total reward: {'EO1': 24.0}
2026-05-19 20:30:28,338 comm.communication INFO <120.00> Optimizing data communication between all pairs of satellites
2026-05-19 20:30:28,339 sats.satellite.EO1 INFO <120.00> EO1: Satellite EO1 requires retasking
2026-05-19 20:30:28,340 gym INFO <120.00> Step reward: 24.0
2026-05-19 20:30:28,340 gym INFO <120.00> === STARTING STEP ===
2026-05-19 20:30:28,341 sats.satellite.EO1 INFO <120.00> EO1: action_nadir_scan tasked for 60.0 seconds
2026-05-19 20:30:28,341 sats.satellite.EO1 INFO <120.00> EO1: setting timed terminal event at 180.0
2026-05-19 20:30:28,345 sats.satellite.EO1 INFO <180.00> EO1: timed termination at 180.0 for action_nadir_scan
2026-05-19 20:30:28,346 data.base INFO <180.00> Total reward: {'EO1': 60.0}
2026-05-19 20:30:28,346 comm.communication INFO <180.00> Optimizing data communication between all pairs of satellites
2026-05-19 20:30:28,347 sats.satellite.EO1 INFO <180.00> EO1: Satellite EO1 requires retasking
2026-05-19 20:30:28,349 gym INFO <180.00> Step reward: 60.0
Initial data level: 0.5961613078 (randomized by sat_args)
Final data level: 0.6801613078
The observation reflects the increase in stored data. The first element, corresponding to storage_level_fraction, starts at a random value set by the storageInit function in sat_args and increases based on the time spent imaging.
Finally, the charging mode is tasked repeatedly in 10-minute increments until the environment time limit is reached.
[8]:
while not truncated:
observation, reward, terminated, truncated, info = env.step(action=1)
print(f"Charge level: {observation[1]:.3f} ({env.unwrapped.simulator.sim_time:.1f} seconds)\n\tEclipse: start: {observation[2]:.1f} end: {observation[3]:.1f}")
2026-05-19 20:30:28,354 gym INFO <180.00> === STARTING STEP ===
2026-05-19 20:30:28,354 sats.satellite.EO1 INFO <180.00> EO1: action_charge tasked for 600.0 seconds
2026-05-19 20:30:28,355 sats.satellite.EO1 INFO <180.00> EO1: setting timed terminal event at 780.0
2026-05-19 20:30:28,385 sats.satellite.EO1 INFO <780.00> EO1: timed termination at 780.0 for action_charge
2026-05-19 20:30:28,386 data.base INFO <780.00> Total reward: {}
2026-05-19 20:30:28,387 comm.communication INFO <780.00> Optimizing data communication between all pairs of satellites
2026-05-19 20:30:28,387 sats.satellite.EO1 INFO <780.00> EO1: Satellite EO1 requires retasking
2026-05-19 20:30:28,390 gym INFO <780.00> Step reward: 0.0
2026-05-19 20:30:28,391 gym INFO <780.00> === STARTING STEP ===
2026-05-19 20:30:28,391 sats.satellite.EO1 INFO <780.00> EO1: action_charge tasked for 600.0 seconds
2026-05-19 20:30:28,391 sats.satellite.EO1 INFO <780.00> EO1: setting timed terminal event at 1380.0
2026-05-19 20:30:28,422 sats.satellite.EO1 INFO <1380.00> EO1: timed termination at 1380.0 for action_charge
2026-05-19 20:30:28,422 data.base INFO <1380.00> Total reward: {}
2026-05-19 20:30:28,423 comm.communication INFO <1380.00> Optimizing data communication between all pairs of satellites
2026-05-19 20:30:28,423 sats.satellite.EO1 INFO <1380.00> EO1: Satellite EO1 requires retasking
2026-05-19 20:30:28,425 gym INFO <1380.00> Step reward: 0.0
2026-05-19 20:30:28,425 gym INFO <1380.00> === STARTING STEP ===
2026-05-19 20:30:28,426 sats.satellite.EO1 INFO <1380.00> EO1: action_charge tasked for 600.0 seconds
2026-05-19 20:30:28,426 sats.satellite.EO1 INFO <1380.00> EO1: setting timed terminal event at 1980.0
2026-05-19 20:30:28,456 sats.satellite.EO1 INFO <1980.00> EO1: timed termination at 1980.0 for action_charge
2026-05-19 20:30:28,457 data.base INFO <1980.00> Total reward: {}
2026-05-19 20:30:28,457 comm.communication INFO <1980.00> Optimizing data communication between all pairs of satellites
2026-05-19 20:30:28,458 sats.satellite.EO1 INFO <1980.00> EO1: Satellite EO1 requires retasking
2026-05-19 20:30:28,459 gym INFO <1980.00> Step reward: 0.0
2026-05-19 20:30:28,460 gym INFO <1980.00> === STARTING STEP ===
2026-05-19 20:30:28,460 sats.satellite.EO1 INFO <1980.00> EO1: action_charge tasked for 600.0 seconds
2026-05-19 20:30:28,461 sats.satellite.EO1 INFO <1980.00> EO1: setting timed terminal event at 2580.0
2026-05-19 20:30:28,491 sats.satellite.EO1 INFO <2580.00> EO1: timed termination at 2580.0 for action_charge
2026-05-19 20:30:28,491 data.base INFO <2580.00> Total reward: {}
2026-05-19 20:30:28,492 comm.communication INFO <2580.00> Optimizing data communication between all pairs of satellites
2026-05-19 20:30:28,492 sats.satellite.EO1 INFO <2580.00> EO1: Satellite EO1 requires retasking
2026-05-19 20:30:28,494 gym INFO <2580.00> Step reward: 0.0
2026-05-19 20:30:28,494 gym INFO <2580.00> === STARTING STEP ===
2026-05-19 20:30:28,494 sats.satellite.EO1 INFO <2580.00> EO1: action_charge tasked for 600.0 seconds
2026-05-19 20:30:28,495 sats.satellite.EO1 INFO <2580.00> EO1: setting timed terminal event at 3180.0
Charge level: 0.332 (780.0 seconds)
Eclipse: start: 5610.0 end: 2040.0
Charge level: 0.329 (1380.0 seconds)
Eclipse: start: 5010.0 end: 1440.0
Charge level: 0.327 (1980.0 seconds)
Eclipse: start: 4410.0 end: 840.0
Charge level: 0.324 (2580.0 seconds)
Eclipse: start: 3810.0 end: 240.0
2026-05-19 20:30:28,526 sats.satellite.EO1 INFO <3180.00> EO1: timed termination at 3180.0 for action_charge
2026-05-19 20:30:28,526 data.base INFO <3180.00> Total reward: {}
2026-05-19 20:30:28,527 comm.communication INFO <3180.00> Optimizing data communication between all pairs of satellites
2026-05-19 20:30:28,527 sats.satellite.EO1 INFO <3180.00> EO1: Satellite EO1 requires retasking
2026-05-19 20:30:28,532 gym INFO <3180.00> Step reward: 0.0
2026-05-19 20:30:28,532 gym INFO <3180.00> === STARTING STEP ===
2026-05-19 20:30:28,533 sats.satellite.EO1 INFO <3180.00> EO1: action_charge tasked for 600.0 seconds
2026-05-19 20:30:28,533 sats.satellite.EO1 INFO <3180.00> EO1: setting timed terminal event at 3780.0
2026-05-19 20:30:28,564 sats.satellite.EO1 INFO <3780.00> EO1: timed termination at 3780.0 for action_charge
2026-05-19 20:30:28,564 data.base INFO <3780.00> Total reward: {}
2026-05-19 20:30:28,565 comm.communication INFO <3780.00> Optimizing data communication between all pairs of satellites
2026-05-19 20:30:28,565 sats.satellite.EO1 INFO <3780.00> EO1: Satellite EO1 requires retasking
2026-05-19 20:30:28,567 gym INFO <3780.00> Step reward: 0.0
2026-05-19 20:30:28,567 gym INFO <3780.00> === STARTING STEP ===
2026-05-19 20:30:28,568 sats.satellite.EO1 INFO <3780.00> EO1: action_charge tasked for 600.0 seconds
2026-05-19 20:30:28,568 sats.satellite.EO1 INFO <3780.00> EO1: setting timed terminal event at 4380.0
Charge level: 0.469 (3180.0 seconds)
Eclipse: start: 3210.0 end: 5310.0
Charge level: 0.698 (3780.0 seconds)
Eclipse: start: 2610.0 end: 4710.0
2026-05-19 20:30:28,598 sats.satellite.EO1 INFO <4380.00> EO1: timed termination at 4380.0 for action_charge
2026-05-19 20:30:28,599 data.base INFO <4380.00> Total reward: {}
2026-05-19 20:30:28,600 comm.communication INFO <4380.00> Optimizing data communication between all pairs of satellites
2026-05-19 20:30:28,600 sats.satellite.EO1 INFO <4380.00> EO1: Satellite EO1 requires retasking
2026-05-19 20:30:28,601 gym INFO <4380.00> Step reward: 0.0
2026-05-19 20:30:28,602 gym INFO <4380.00> === STARTING STEP ===
2026-05-19 20:30:28,602 sats.satellite.EO1 INFO <4380.00> EO1: action_charge tasked for 600.0 seconds
2026-05-19 20:30:28,602 sats.satellite.EO1 INFO <4380.00> EO1: setting timed terminal event at 4980.0
2026-05-19 20:30:28,632 sats.satellite.EO1 INFO <4980.00> EO1: timed termination at 4980.0 for action_charge
2026-05-19 20:30:28,633 data.base INFO <4980.00> Total reward: {}
2026-05-19 20:30:28,633 comm.communication INFO <4980.00> Optimizing data communication between all pairs of satellites
2026-05-19 20:30:28,634 sats.satellite.EO1 INFO <4980.00> EO1: Satellite EO1 requires retasking
2026-05-19 20:30:28,636 gym INFO <4980.00> Step reward: 0.0
2026-05-19 20:30:28,636 gym INFO <4980.00> === STARTING STEP ===
2026-05-19 20:30:28,637 sats.satellite.EO1 INFO <4980.00> EO1: action_charge tasked for 600.0 seconds
2026-05-19 20:30:28,637 sats.satellite.EO1 INFO <4980.00> EO1: setting timed terminal event at 5580.0
2026-05-19 20:30:28,667 sats.satellite.EO1 INFO <5580.00> EO1: timed termination at 5580.0 for action_charge
2026-05-19 20:30:28,668 data.base INFO <5580.00> Total reward: {}
2026-05-19 20:30:28,668 comm.communication INFO <5580.00> Optimizing data communication between all pairs of satellites
2026-05-19 20:30:28,669 sats.satellite.EO1 INFO <5580.00> EO1: Satellite EO1 requires retasking
2026-05-19 20:30:28,670 gym INFO <5580.00> Step reward: 0.0
2026-05-19 20:30:28,670 gym INFO <5580.00> === STARTING STEP ===
2026-05-19 20:30:28,671 sats.satellite.EO1 INFO <5580.00> EO1: action_charge tasked for 600.0 seconds
2026-05-19 20:30:28,671 sats.satellite.EO1 INFO <5580.00> EO1: setting timed terminal event at 6180.0
2026-05-19 20:30:28,678 data.base INFO <5700.00> Total reward: {}
2026-05-19 20:30:28,679 comm.communication INFO <5700.00> Optimizing data communication between all pairs of satellites
2026-05-19 20:30:28,680 gym INFO <5700.00> Step reward: 0.0
2026-05-19 20:30:28,681 gym INFO <5700.00> Episode terminated: False
2026-05-19 20:30:28,681 gym INFO <5700.00> Episode truncated: True
Charge level: 0.928 (4380.0 seconds)
Eclipse: start: 2010.0 end: 4110.0
Charge level: 1.000 (4980.0 seconds)
Eclipse: start: 1410.0 end: 3510.0
Charge level: 1.000 (5580.0 seconds)
Eclipse: start: 810.0 end: 2910.0
Charge level: 1.000 (5700.0 seconds)
Eclipse: start: 690.0 end: 2790.0
It is observed that the battery decrease while the satellite is in eclipse, but once the satellite is out of eclipse, the battery quickly increases to full charge.