Getting Started
This tutorial demonstrates the configuration and use of a simple BSK-RL environment. BSK-RL and dependencies should already be installed at this point (see Installation if you haven’t installed the package yet).
Load Modules
In this tutorial, the environment will be created with gym.make
, so it is necessary to import the top-level bsk_rl
module as well as gym
and bsk_rl
components.
[1]:
import gymnasium as gym
import numpy as np
from bsk_rl import act, data, obs, scene, sats
from bsk_rl.sim import dyn, fsw
from Basilisk.architecture import bskLogging
bskLogging.setDefaultLogLevel(bskLogging.BSK_WARNING)
If no errors were raised, you have a functional installation of bsk_rl
.
Configure the Satellite
Satellites are configurable agents in the environment. To make a new environment, start by specifying the observations and actions of a satellite type, as well as the underlying Basilisk simulation models used by the satellite.
[2]:
class MyScanningSatellite(sats.AccessSatellite):
observation_spec = [
obs.SatProperties(
dict(prop="storage_level_fraction"),
dict(prop="battery_charge_fraction")
),
obs.Eclipse(),
]
action_spec = [
act.Scan(duration=60.0), # Scan for 1 minute
act.Charge(duration=600.0), # Charge for 10 minutes
]
dyn_type = dyn.ContinuousImagingDynModel
fsw_type = fsw.ContinuousImagingFSWModel
Based on this class specification, a list of configurable parameters for the satellite can be generated.
[3]:
MyScanningSatellite.default_sat_args()
[3]:
{'hs_min': 0.0,
'maxCounterValue': 4,
'thrMinFireTime': 0.02,
'desatAttitude': 'sun',
'controlAxes_B': [1, 0, 0, 0, 1, 0, 0, 0, 1],
'thrForceSign': 1,
'K': 7.0,
'Ki': -1,
'P': 35.0,
'imageAttErrorRequirement': 0.01,
'imageRateErrorRequirement': None,
'inst_pHat_B': [0, 0, 1],
'utc_init': 'this value will be set by the world model',
'batteryStorageCapacity': 288000.0,
'storedCharge_Init': <function bsk_rl.sim.dyn.BasicDynamicsModel.<lambda>()>,
'disturbance_vector': None,
'dragCoeff': 2.2,
'imageTargetMaximumRange': -1,
'instrumentBaudRate': 8000000.0,
'instrumentPowerDraw': -30.0,
'basePowerDraw': 0.0,
'wheelSpeeds': <function bsk_rl.sim.dyn.BasicDynamicsModel.<lambda>()>,
'maxWheelSpeed': inf,
'u_max': 0.2,
'rwBasePower': 0.4,
'rwMechToElecEfficiency': 0.0,
'rwElecToMechEfficiency': 0.5,
'panelArea': 1.0,
'panelEfficiency': 0.2,
'nHat_B': array([ 0, 0, -1]),
'mass': 330,
'width': 1.38,
'depth': 1.04,
'height': 1.58,
'sigma_init': <function bsk_rl.sim.dyn.BasicDynamicsModel.<lambda>()>,
'omega_init': <function bsk_rl.sim.dyn.BasicDynamicsModel.<lambda>()>,
'rN': None,
'vN': None,
'oe': <function bsk_rl.utils.orbital.random_orbit(i: Optional[float] = 45.0, alt: float = 500, r_body: float = 6371, e: float = 0, Omega: Optional[float] = None, omega: Optional[float] = 0, f: Optional[float] = None) -> Basilisk.utilities.orbitalMotion.ClassicElements>,
'mu': 398600436000000.0,
'dataStorageCapacity': 160000000.0,
'storageUnitValidCheck': False,
'storageInit': 0,
'thrusterPowerDraw': 0.0,
'transmitterBaudRate': -8000000.0,
'transmitterNumBuffers': 100,
'transmitterPowerDraw': -15.0}
When instantiating a satellite, these parameters can be overriden with a constant or rerandomized every time the environment is reset using the sat_args
dictionary.
[4]:
sat_args = {}
# Set some parameters as constants
sat_args["imageAttErrorRequirement"] = 0.05
sat_args["dataStorageCapacity"] = 1e10
sat_args["instrumentBaudRate"] = 1e7
sat_args["storedCharge_Init"] = 50000.0
# Randomize the initial storage level on every reset
sat_args["storageInit"] = lambda: np.random.uniform(0.25, 0.75) * 1e10
# Make the satellite
sat = MyScanningSatellite(name="EO1", sat_args=sat_args)
Making the Environment
For this example, we will be using the single-agent SatelliteTasking environment. Along with passing the satellite that we configured, the environment takes a scenario, which defines the environment the satellite is acting in, and a rewarder, which defines how data collected from the scenario is rewarded.
[5]:
env = gym.make(
"SatelliteTasking-v1",
satellite=sat,
scenario=scene.UniformNadirScanning(),
rewarder=data.ScanningTimeReward(),
time_limit=5700.0, # approximately 1 orbit
log_level="INFO",
)
2024-09-12 15:07:02,436 gym INFO Calling env.reset() to get observation space
2024-09-12 15:07:02,436 gym INFO Resetting environment with seed=426283552
2024-09-12 15:07:02,518 sats.satellite.EO1 INFO <0.00> EO1: Finding opportunity windows from 0.00 to 5700.00 seconds
2024-09-12 15:07:02,533 gym INFO <0.00> Environment reset
Interacting with the Environment
First, the environment is reset.
[6]:
observation, info = env.reset(seed=1)
2024-09-12 15:07:02,602 gym INFO Resetting environment with seed=1
2024-09-12 15:07:02,738 sats.satellite.EO1 INFO <0.00> EO1: Finding opportunity windows from 0.00 to 5700.00 seconds
2024-09-12 15:07:02,751 gym INFO <0.00> Environment reset
Next, we take the scan action (action=0
) a few times. This allows for the satellite to settle its attitude in the nadir pointing mode to satisfy imaging conditions. Note that the logs show little or no data accumulated in the first two steps as it settles, but achieves 60 reward (corresponding to 60 seconds of imaging) by the third step.
[7]:
print("Initial data level:", observation[0], "(randomized by sat_args)")
for _ in range(3):
observation, reward, terminated, truncated, info = env.step(action=0)
print(" Final data level:", observation[0])
2024-09-12 15:07:02,754 gym INFO <0.00> === STARTING STEP ===
2024-09-12 15:07:02,755 sats.satellite.EO1 INFO <0.00> EO1: action_nadir_scan tasked for 60.0 seconds
2024-09-12 15:07:02,755 sats.satellite.EO1 INFO <0.00> EO1: setting timed terminal event at 60.0
2024-09-12 15:07:02,759 sats.satellite.EO1 INFO <60.00> EO1: timed termination at 60.0 for action_nadir_scan
2024-09-12 15:07:02,760 data.base INFO <60.00> Data reward: {'EO1': 0.0}
2024-09-12 15:07:02,760 sats.satellite.EO1 INFO <60.00> EO1: Satellite EO1 requires retasking
2024-09-12 15:07:02,761 gym INFO <60.00> Step reward: 0.0
2024-09-12 15:07:02,761 gym INFO <60.00> === STARTING STEP ===
2024-09-12 15:07:02,761 sats.satellite.EO1 INFO <60.00> EO1: action_nadir_scan tasked for 60.0 seconds
2024-09-12 15:07:02,761 sats.satellite.EO1 INFO <60.00> EO1: setting timed terminal event at 120.0
2024-09-12 15:07:02,765 sats.satellite.EO1 INFO <120.00> EO1: timed termination at 120.0 for action_nadir_scan
2024-09-12 15:07:02,765 data.base INFO <120.00> Data reward: {'EO1': 30.0}
2024-09-12 15:07:02,766 sats.satellite.EO1 INFO <120.00> EO1: Satellite EO1 requires retasking
2024-09-12 15:07:02,766 gym INFO <120.00> Step reward: 30.0
2024-09-12 15:07:02,766 gym INFO <120.00> === STARTING STEP ===
2024-09-12 15:07:02,767 sats.satellite.EO1 INFO <120.00> EO1: action_nadir_scan tasked for 60.0 seconds
2024-09-12 15:07:02,767 sats.satellite.EO1 INFO <120.00> EO1: setting timed terminal event at 180.0
2024-09-12 15:07:02,770 sats.satellite.EO1 INFO <180.00> EO1: timed termination at 180.0 for action_nadir_scan
2024-09-12 15:07:02,771 data.base INFO <180.00> Data reward: {'EO1': 60.0}
2024-09-12 15:07:02,771 sats.satellite.EO1 INFO <180.00> EO1: Satellite EO1 requires retasking
2024-09-12 15:07:02,771 gym INFO <180.00> Step reward: 60.0
Initial data level: 0.7341307878 (randomized by sat_args)
Final data level: 0.8241307878
The observation reflects the increase in stored data. The first element, corresponding to storage_level_fraction
, starts at a random value set by the storageInit
function in sat_args
and increases based on the time spent imaging.
Finally, the charging mode is tasked repeatedly in 10-minute increments until the environment time limit is reached.
[8]:
while not truncated:
observation, reward, terminated, truncated, info = env.step(action=1)
print(f"Charge level: {observation[1]:.3f} ({env.unwrapped.simulator.sim_time:.1f} seconds)\n\tEclipse: start: {observation[2]:.1f} end: {observation[3]:.1f}")
2024-09-12 15:07:02,775 gym INFO <180.00> === STARTING STEP ===
2024-09-12 15:07:02,775 sats.satellite.EO1 INFO <180.00> EO1: action_charge tasked for 600.0 seconds
2024-09-12 15:07:02,775 sats.satellite.EO1 INFO <180.00> EO1: setting timed terminal event at 780.0
2024-09-12 15:07:02,809 sats.satellite.EO1 INFO <780.00> EO1: timed termination at 780.0 for action_charge
2024-09-12 15:07:02,810 data.base INFO <780.00> Data reward: {'EO1': 0.0}
2024-09-12 15:07:02,810 sats.satellite.EO1 INFO <780.00> EO1: Satellite EO1 requires retasking
2024-09-12 15:07:02,836 gym INFO <780.00> Step reward: 0.0
2024-09-12 15:07:02,837 gym INFO <780.00> === STARTING STEP ===
2024-09-12 15:07:02,837 sats.satellite.EO1 INFO <780.00> EO1: action_charge tasked for 600.0 seconds
2024-09-12 15:07:02,837 sats.satellite.EO1 INFO <780.00> EO1: setting timed terminal event at 1380.0
2024-09-12 15:07:02,868 sats.satellite.EO1 INFO <1380.00> EO1: timed termination at 1380.0 for action_charge
2024-09-12 15:07:02,868 data.base INFO <1380.00> Data reward: {'EO1': 0.0}
2024-09-12 15:07:02,869 sats.satellite.EO1 INFO <1380.00> EO1: Satellite EO1 requires retasking
2024-09-12 15:07:02,869 gym INFO <1380.00> Step reward: 0.0
2024-09-12 15:07:02,870 gym INFO <1380.00> === STARTING STEP ===
2024-09-12 15:07:02,870 sats.satellite.EO1 INFO <1380.00> EO1: action_charge tasked for 600.0 seconds
2024-09-12 15:07:02,870 sats.satellite.EO1 INFO <1380.00> EO1: setting timed terminal event at 1980.0
2024-09-12 15:07:02,903 sats.satellite.EO1 INFO <1980.00> EO1: timed termination at 1980.0 for action_charge
2024-09-12 15:07:02,903 data.base INFO <1980.00> Data reward: {'EO1': 0.0}
2024-09-12 15:07:02,903 sats.satellite.EO1 INFO <1980.00> EO1: Satellite EO1 requires retasking
2024-09-12 15:07:02,904 gym INFO <1980.00> Step reward: 0.0
2024-09-12 15:07:02,904 gym INFO <1980.00> === STARTING STEP ===
2024-09-12 15:07:02,904 sats.satellite.EO1 INFO <1980.00> EO1: action_charge tasked for 600.0 seconds
2024-09-12 15:07:02,905 sats.satellite.EO1 INFO <1980.00> EO1: setting timed terminal event at 2580.0
2024-09-12 15:07:02,937 sats.satellite.EO1 INFO <2580.00> EO1: timed termination at 2580.0 for action_charge
2024-09-12 15:07:02,937 data.base INFO <2580.00> Data reward: {'EO1': 0.0}
2024-09-12 15:07:02,937 sats.satellite.EO1 INFO <2580.00> EO1: Satellite EO1 requires retasking
2024-09-12 15:07:02,967 gym INFO <2580.00> Step reward: 0.0
Charge level: 0.339 (780.0 seconds)
Eclipse: start: 5340.0 end: 1800.0
Charge level: 0.337 (1380.0 seconds)
Eclipse: start: 4740.0 end: 1200.0
Charge level: 0.334 (1980.0 seconds)
Eclipse: start: 4140.0 end: 600.0
2024-09-12 15:07:02,968 gym INFO <2580.00> === STARTING STEP ===
2024-09-12 15:07:02,968 sats.satellite.EO1 INFO <2580.00> EO1: action_charge tasked for 600.0 seconds
2024-09-12 15:07:02,968 sats.satellite.EO1 INFO <2580.00> EO1: setting timed terminal event at 3180.0
2024-09-12 15:07:03,002 sats.satellite.EO1 INFO <3180.00> EO1: timed termination at 3180.0 for action_charge
2024-09-12 15:07:03,002 data.base INFO <3180.00> Data reward: {'EO1': 0.0}
2024-09-12 15:07:03,003 sats.satellite.EO1 INFO <3180.00> EO1: Satellite EO1 requires retasking
2024-09-12 15:07:03,003 gym INFO <3180.00> Step reward: 0.0
2024-09-12 15:07:03,003 gym INFO <3180.00> === STARTING STEP ===
2024-09-12 15:07:03,004 sats.satellite.EO1 INFO <3180.00> EO1: action_charge tasked for 600.0 seconds
2024-09-12 15:07:03,004 sats.satellite.EO1 INFO <3180.00> EO1: setting timed terminal event at 3780.0
2024-09-12 15:07:03,037 sats.satellite.EO1 INFO <3780.00> EO1: timed termination at 3780.0 for action_charge
2024-09-12 15:07:03,037 data.base INFO <3780.00> Data reward: {'EO1': 0.0}
Charge level: 0.354 (2580.0 seconds)
Eclipse: start: 3540.0 end: 5670.0
Charge level: 0.942 (3180.0 seconds)
Eclipse: start: 2940.0 end: 5070.0
2024-09-12 15:07:03,037 sats.satellite.EO1 INFO <3780.00> EO1: Satellite EO1 requires retasking
2024-09-12 15:07:03,038 gym INFO <3780.00> Step reward: 0.0
2024-09-12 15:07:03,038 gym INFO <3780.00> === STARTING STEP ===
2024-09-12 15:07:03,038 sats.satellite.EO1 INFO <3780.00> EO1: action_charge tasked for 600.0 seconds
2024-09-12 15:07:03,039 sats.satellite.EO1 INFO <3780.00> EO1: setting timed terminal event at 4380.0
2024-09-12 15:07:03,073 sats.satellite.EO1 INFO <4380.00> EO1: timed termination at 4380.0 for action_charge
2024-09-12 15:07:03,074 data.base INFO <4380.00> Data reward: {'EO1': 0.0}
2024-09-12 15:07:03,074 sats.satellite.EO1 INFO <4380.00> EO1: Satellite EO1 requires retasking
2024-09-12 15:07:03,075 gym INFO <4380.00> Step reward: 0.0
2024-09-12 15:07:03,075 gym INFO <4380.00> === STARTING STEP ===
2024-09-12 15:07:03,076 sats.satellite.EO1 INFO <4380.00> EO1: action_charge tasked for 600.0 seconds
2024-09-12 15:07:03,076 sats.satellite.EO1 INFO <4380.00> EO1: setting timed terminal event at 4980.0
2024-09-12 15:07:03,111 sats.satellite.EO1 INFO <4980.00> EO1: timed termination at 4980.0 for action_charge
2024-09-12 15:07:03,111 data.base INFO <4980.00> Data reward: {'EO1': 0.0}
2024-09-12 15:07:03,112 sats.satellite.EO1 INFO <4980.00> EO1: Satellite EO1 requires retasking
2024-09-12 15:07:03,112 gym INFO <4980.00> Step reward: 0.0
2024-09-12 15:07:03,113 gym INFO <4980.00> === STARTING STEP ===
2024-09-12 15:07:03,113 sats.satellite.EO1 INFO <4980.00> EO1: action_charge tasked for 600.0 seconds
2024-09-12 15:07:03,113 sats.satellite.EO1 INFO <4980.00> EO1: setting timed terminal event at 5580.0
2024-09-12 15:07:03,149 sats.satellite.EO1 INFO <5580.00> EO1: timed termination at 5580.0 for action_charge
2024-09-12 15:07:03,149 data.base INFO <5580.00> Data reward: {'EO1': 0.0}
2024-09-12 15:07:03,150 sats.satellite.EO1 INFO <5580.00> EO1: Satellite EO1 requires retasking
2024-09-12 15:07:03,150 gym INFO <5580.00> Step reward: 0.0
2024-09-12 15:07:03,151 gym INFO <5580.00> === STARTING STEP ===
2024-09-12 15:07:03,151 sats.satellite.EO1 INFO <5580.00> EO1: action_charge tasked for 600.0 seconds
2024-09-12 15:07:03,151 sats.satellite.EO1 INFO <5580.00> EO1: setting timed terminal event at 6180.0
2024-09-12 15:07:03,159 data.base INFO <5700.00> Data reward: {'EO1': 0.0}
2024-09-12 15:07:03,160 gym INFO <5700.00> Step reward: 0.0
2024-09-12 15:07:03,160 gym INFO <5700.00> Episode terminated: False
2024-09-12 15:07:03,161 gym INFO <5700.00> Episode truncated: True
Charge level: 1.000 (3780.0 seconds)
Eclipse: start: 2340.0 end: 4470.0
Charge level: 1.000 (4380.0 seconds)
Eclipse: start: 1740.0 end: 3870.0
Charge level: 1.000 (4980.0 seconds)
Eclipse: start: 1140.0 end: 3270.0
Charge level: 1.000 (5580.0 seconds)
Eclipse: start: 540.0 end: 2670.0
Charge level: 1.000 (5700.0 seconds)
Eclipse: start: 420.0 end: 2550.0
It is observed that the battery decrease while the satellite is in eclipse, but once the satellite is out of eclipse, the battery quickly increases to full charge.