Training with RLlib PPO

RLlib is a high-performance, distributed reinforcement learning library. It is preferable to other RL libraries (e.g. Stable Baselines

  1. for bsk_rl environments because it steps environments copies asynchronously; because of the variable step lengths, variable episode step counts, and long episode reset times, stepping each environment independently can increase step throughput by 2-5 times.

Warning: RLlib had a bug that results in an undesirable timeout which stops training. It has since been resolved: https://github.com/ray-project/ray/pull/45147

RLlib is actively developed and can change significantly from version to version. For this script, the following version is used:

[1]:
from importlib.metadata import version
version("ray")  # Parent package of RLlib
[1]:
'2.35.0'

Define the Environment

A nadir-scanning environment is created, to the one used in this paper. The satellite has to collect data while managing the data buffer level and battery level.

First, the satellite class is defined. A custom dynamics model is created that defines a few additional properties to use in the state.

[2]:
import numpy as np
from bsk_rl import act, data, obs, sats, scene
from bsk_rl.sim import dyn, fsw

class ScanningDownlinkDynModel(dyn.ContinuousImagingDynModel, dyn.GroundStationDynModel):
    # Define some custom properties to be accessed in the state
    @property
    def instrument_pointing_error(self) -> float:
        r_BN_P_unit = self.r_BN_P/np.linalg.norm(self.r_BN_P)
        c_hat_P = self.satellite.fsw.c_hat_P
        return np.arccos(np.dot(-r_BN_P_unit, c_hat_P))

    @property
    def solar_pointing_error(self) -> float:
        a = self.world.gravFactory.spiceObject.planetStateOutMsgs[
            self.world.sun_index
        ].read().PositionVector
        a_hat_N = a / np.linalg.norm(a)
        nHat_B = self.satellite.sat_args["nHat_B"]
        NB = np.transpose(self.BN)
        nHat_N = NB @ nHat_B
        return np.arccos(np.dot(nHat_N, a_hat_N))

class ScanningSatellite(sats.AccessSatellite):
    observation_spec = [
        obs.SatProperties(
            dict(prop="storage_level_fraction"),
            dict(prop="battery_charge_fraction"),
            dict(prop="wheel_speeds_fraction"),
            dict(prop="instrument_pointing_error", norm=np.pi),
            dict(prop="solar_pointing_error", norm=np.pi)
        ),
        obs.OpportunityProperties(
            dict(prop="opportunity_open", norm=5700),
            dict(prop="opportunity_close", norm=5700),
            type="ground_station",
            n_ahead_observe=1,
        ),
        obs.Eclipse(norm=5700),
        obs.Time(),
    ]
    action_spec = [
        act.Scan(duration=180.0),
        act.Charge(duration=120.0),
        act.Downlink(duration=60.0),
        act.Desat(duration=60.0),
    ]
    dyn_type = ScanningDownlinkDynModel
    fsw_type = fsw.ContinuousImagingFSWModel

Next, parameters are set. Since this scenario is focused on maintaining acceptable data and power levels, these are tuned to create a sufficiently interesting mission.

[3]:
sat = ScanningSatellite(
    "Scanner-1",
    sat_args=dict(
        # Data
        dataStorageCapacity=5000 * 8e6,  # bits
        storageInit=lambda: np.random.uniform(0.0, 0.8) * 5000 * 8e6,
        instrumentBaudRate=0.5 * 8e6,
        transmitterBaudRate=-50 * 8e6,
        # Power
        batteryStorageCapacity=200 * 3600,  # W*s
        storedCharge_Init=lambda: np.random.uniform(0.3, 1.0) * 200 * 3600,
        basePowerDraw=-10.0,  # W
        instrumentPowerDraw=-30.0,  # W
        transmitterPowerDraw=-25.0,  # W
        thrusterPowerDraw=-80.0,  # W
        panelArea=0.25,
        # Attitude
        imageAttErrorRequirement=0.1,
        imageRateErrorRequirement=0.1,
        disturbance_vector=lambda: np.random.normal(scale=0.0001, size=3),  # N*m
        maxWheelSpeed=6000.0,  # RPM
        wheelSpeeds=lambda: np.random.uniform(-3000, 3000, 3),
        desatAttitude="nadir",
    )
)

Finally, the environment arguments are set. Stepping through this environment is demonstrated at the bottom of the page.

[4]:
duration = 5 * 5700.0  # About 5 orbits
env_args = dict(
    satellite=sat,
    scenario=scene.UniformNadirScanning(value_per_second=1/duration),
    rewarder=data.ScanningTimeReward(),
    time_limit=duration,
    failure_penalty=-1.0,
    terminate_on_time_limit=True,
)

Set Up Custom Logging

The bsk_rl package supplies a utility to make logging information at the end of episodes easier. This is useful to see how an agent’s policy is changing over time, using a monitoring program such as TensorBoard. The callback is configured by writing a function that takes the environment as an input and returns a dictionary with values to be logged.

[5]:
def episode_data_callback(env):
    reward = env.rewarder.cum_reward
    reward = sum(reward.values()) / len(reward)
    orbits = env.simulator.sim_time / (95 * 60)

    data = dict(
        reward=reward,
        # Are satellites dying, and how and when?
        alive=float(env.satellite.is_alive()),
        rw_status_valid=float(env.satellite.dynamics.rw_speeds_valid()),
        battery_status_valid=float(env.satellite.dynamics.battery_valid()),
        orbits_complete=orbits,
    )
    if orbits > 0:
        data["reward_per_orbit"] = reward / orbits
    if not env.satellite.is_alive():
        data["orbits_complete_partial_only"] = orbits

    return data

Configure Ray and PPO

PPO (or some other algorithm) can be configured. Of particular importance are setting sample_timeout_s and metrics_episode_collection_timeout_s to appropriately high values for this environment. The episode_data_callback is included in the environment arguments, and the WrappedEpisodeDataCallbacks must be included in training to trigger logging.

[6]:
import bsk_rl.utils.rllib  # noqa To access "SatelliteTasking-RLlib"
from ray.rllib.algorithms.ppo import PPOConfig
from bsk_rl.utils.rllib.callbacks import WrappedEpisodeDataCallbacks

N_CPUS = 3

training_args = dict(
    lr=0.00003,
    gamma=0.999,
    train_batch_size=250,  # usually a larger number, like 2500
    num_sgd_iter=10,
    model=dict(fcnet_hiddens=[512, 512], vf_share_layers=False),
    lambda_=0.95,
    use_kl_loss=False,
    clip_param=0.1,
    grad_clip=0.5,
)

config = (
    PPOConfig()
    .training(**training_args)
    .env_runners(num_env_runners=N_CPUS-1, sample_timeout_s=1000.0)
    .environment(
        env="SatelliteTasking-RLlib",
        env_config=dict(**env_args, episode_data_callback=episode_data_callback),
    )
    .reporting(
        metrics_num_episodes_for_smoothing=1,
        metrics_episode_collection_timeout_s=180,
    )
    .checkpointing(export_native_model_files=True)
    .framework(framework="torch")
    .api_stack(
        enable_rl_module_and_learner=True,
        enable_env_runner_and_connector_v2=True,
    )
    .callbacks(WrappedEpisodeDataCallbacks)
)

Once the PPO configuration has been set, ray can be started and the agent can be trained.

Training on a reasonably modern machine, we can achieve 5M steps over 32 processors in 6 to 18 hours, depending on specific environment configurations.

Note that the custom logging metrics are reported under env_runners.

[7]:
import ray
from ray import tune

ray.init(
    ignore_reinit_error=True,
    num_cpus=N_CPUS,
    object_store_memory=2_000_000_000,  # 2 GB
)

# Run the training
tune.run(
    "PPO",
    config=config.to_dict(),
    stop={"training_iteration": 10},  # Adjust the number of iterations as needed
    checkpoint_freq=10,
    checkpoint_at_end=True
)

# Shutdown Ray
ray.shutdown()
2025-12-03 22:23:25,199 INFO worker.py:1783 -- Started a local Ray instance.
2025-12-03 22:23:28,750 INFO tune.py:616 -- [output] This uses the legacy output and progress reporter, as Jupyter notebooks are not supported by the new engine, yet. For more information, please see https://github.com/ray-project/ray/issues/36949
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/gymnasium/spaces/box.py:130: UserWarning: WARN: Box bound precision lowered by casting to float32
  gym.logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/gymnasium/utils/passive_env_checker.py:164: UserWarning: WARN: The obs returned by the `reset()` method was expecting numpy array dtype to be float32, actual type: float64
  logger.warn(
/opt/hostedtoolcache/Python/3.11.14/x64/lib/python3.11/site-packages/gymnasium/utils/passive_env_checker.py:188: UserWarning: WARN: The obs returned by the `reset()` method is not within the observation space.
  logger.warn(f"{pre} is not within the observation space.")

Tune Status

Current time:2025-12-03 22:23:58
Running for: 00:00:30.14
Memory: 4.6/15.6 GiB

System Info

Using FIFO scheduling algorithm.
Logical resource usage: 3.0/3 CPUs, 0/0 GPUs

Trial Status

Trial name status loc iter total time (s) num_env_steps_sample d_lifetime num_episodes_lifetim e num_env_steps_traine d_lifetime
PPO_SatelliteTasking-RLlib_b15b2_00000TERMINATED10.1.0.234:4723 10 15.16852500122500
(PPO pid=4723) Install gputil for GPU system monitoring.

Trial Progress

Trial name env_runners fault_tolerance learners num_agent_steps_sampled_lifetime num_env_steps_sampled_lifetime num_env_steps_trained_lifetime num_episodes_lifetimeperf timers
PPO_SatelliteTasking-RLlib_b15b2_00000{'num_env_steps_sampled_lifetime': 25000, 'num_agent_steps_sampled_lifetime': {'default_agent': 13750}, 'num_agent_steps_sampled': {'default_agent': 250}, 'sample': np.float64(1.367194134069598), 'num_module_steps_sampled': {'default_policy': 250}, 'num_episodes': 2, 'num_module_steps_sampled_lifetime': {'default_policy': 13750}, 'num_env_steps_sampled': 250, 'agent_episode_returns_mean': {'default_agent': -0.2040350877192983}, 'episode_duration_sec_mean': 2.5031321624999805, 'episode_return_max': 0.31828070175438594, 'episode_return_mean': -0.2040350877192983, 'time_between_sampling': np.float64(0.1725585620366782), 'episode_len_max': 264, 'module_episode_returns_mean': {'default_policy': -0.2040350877192983}, 'episode_return_min': -0.7263508771929825, 'episode_len_min': 239, 'episode_len_mean': 251.5, 'alive': np.float64(0.5), 'orbits_complete': np.float64(4.715789473684211), 'orbits_complete_partial_only': np.float64(4.431578947368421), 'rw_status_valid': np.float64(1.0), 'reward': np.float64(0.2959649122807019), 'reward_per_orbit': np.float64(0.06270297120473395), 'battery_status_valid': np.float64(0.5)}{'num_healthy_workers': 2, 'num_in_flight_async_reqs': 0, 'num_remote_worker_restarts': 0}{'default_policy': {'policy_loss': 0.3846481740474701, 'entropy': 1.3383846282958984, 'num_trainable_parameters': 139525.0, 'num_module_steps_trained': 250, 'default_optimizer_learning_rate': 3e-05, 'mean_kl_loss': 0.0, 'vf_loss_unclipped': 0.02403366193175316, 'curr_entropy_coeff': 0.0, 'num_non_trainable_parameters': 0.0, 'gradients_default_optimizer_global_norm': 0.5023936033248901, 'total_loss': 0.40868186950683594, 'vf_loss': 0.02403366193175316, 'vf_explained_var': 0.5888400077819824}, '__all_modules__': {'num_trainable_parameters': 139525.0, 'num_module_steps_trained': 250, 'num_env_steps_trained': 250, 'num_non_trainable_parameters': 0.0, 'total_loss': 0.40868186950683594}}{'default_agent': 2500} 2500 2500 12{'cpu_util_percent': np.float64(48.5), 'ram_util_percent': np.float64(29.3)}{'env_runner_sampling_timer': 1.395186658357206, 'learner_update_timer': 0.11432805553085214, 'synch_weights': 0.005468298723549987, 'synch_env_connectors': 0.005859915556338293}
(SingleAgentEnvRunner pid=4770) 2025-12-03 22:23:45,761 sats.satellite.Scanner-1       WARNING    <20280.00> Scanner-1: failed battery_valid check
(SingleAgentEnvRunner pid=4771) 2025-12-03 22:23:51,390 sats.satellite.Scanner-1       WARNING    <26340.00> Scanner-1: failed battery_valid check [repeated 5x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
(SingleAgentEnvRunner pid=4770) 2025-12-03 22:23:57,781 sats.satellite.Scanner-1       WARNING    <25260.00> Scanner-1: failed battery_valid check [repeated 3x across cluster]
(PPO pid=4723) Checkpoint successfully created at: Checkpoint(filesystem=local, path=/home/runner/ray_results/PPO_2025-12-03_22-23-28/PPO_SatelliteTasking-RLlib_b15b2_00000_0_2025-12-03_22-23-28/checkpoint_000000)
2025-12-03 22:23:58,925 INFO tune.py:1009 -- Wrote the latest version of all result files and experiment state to '/home/runner/ray_results/PPO_2025-12-03_22-23-28' in 0.0218s.
2025-12-03 22:23:59,019 INFO tune.py:1041 -- Total run time: 30.27 seconds (30.12 seconds for the tuning loop).

Loading the Policy Network

The policy network can be found in the p0 subdirectory of the checkpoint output, if using the torch backend, and the model subdirectory of the checkpoint output. Use bsk_rl.utils.rllib.load_torch_mlp_policy to load torch policies.

Stepping Through the Environment

The environment is stepped through with random actions to give a sense of how it acts.

[8]:
from bsk_rl import SatelliteTasking

env = SatelliteTasking(**env_args, log_level="INFO")
env.reset()
terminated = False
while not terminated:
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
2025-12-03 22:24:00,500 gym                            INFO       Resetting environment with seed=208396090
2025-12-03 22:24:00,600 sats.satellite.Scanner-1       INFO       <0.00> Scanner-1: Finding opportunity windows from 0.00 to 28800.00 seconds
2025-12-03 22:24:00,651 gym                            INFO       <0.00> Environment reset
2025-12-03 22:24:00,651 gym                            INFO       <0.00> === STARTING STEP ===
2025-12-03 22:24:00,652 sats.satellite.Scanner-1       INFO       <0.00> Scanner-1: action_nadir_scan tasked for 180.0 seconds
2025-12-03 22:24:00,653 sats.satellite.Scanner-1       INFO       <0.00> Scanner-1: setting timed terminal event at 180.0
2025-12-03 22:24:00,664 sats.satellite.Scanner-1       INFO       <180.00> Scanner-1: timed termination at 180.0 for action_nadir_scan
2025-12-03 22:24:00,664 data.base                      INFO       <180.00> Total reward: {'Scanner-1': 0.002807017543859649}
2025-12-03 22:24:00,665 comm.communication             INFO       <180.00> Optimizing data communication between all pairs of satellites
2025-12-03 22:24:00,666 sats.satellite.Scanner-1       INFO       <180.00> Scanner-1: Satellite Scanner-1 requires retasking
2025-12-03 22:24:00,668 gym                            INFO       <180.00> Step reward: 0.002807017543859649
2025-12-03 22:24:00,668 gym                            INFO       <180.00> === STARTING STEP ===
2025-12-03 22:24:00,669 sats.satellite.Scanner-1       INFO       <180.00> Scanner-1: action_charge tasked for 120.0 seconds
2025-12-03 22:24:00,669 sats.satellite.Scanner-1       INFO       <180.00> Scanner-1: setting timed terminal event at 300.0
2025-12-03 22:24:00,677 sats.satellite.Scanner-1       INFO       <300.00> Scanner-1: timed termination at 300.0 for action_charge
2025-12-03 22:24:00,677 data.base                      INFO       <300.00> Total reward: {}
2025-12-03 22:24:00,678 comm.communication             INFO       <300.00> Optimizing data communication between all pairs of satellites
2025-12-03 22:24:00,678 sats.satellite.Scanner-1       INFO       <300.00> Scanner-1: Satellite Scanner-1 requires retasking
2025-12-03 22:24:00,680 gym                            INFO       <300.00> Step reward: 0.0
2025-12-03 22:24:00,681 gym                            INFO       <300.00> === STARTING STEP ===
2025-12-03 22:24:00,682 sats.satellite.Scanner-1       INFO       <300.00> Scanner-1: action_downlink tasked for 60.0 seconds
2025-12-03 22:24:00,682 sats.satellite.Scanner-1       INFO       <300.00> Scanner-1: setting timed terminal event at 360.0
2025-12-03 22:24:00,688 sats.satellite.Scanner-1       INFO       <360.00> Scanner-1: timed termination at 360.0 for action_downlink
2025-12-03 22:24:00,688 data.base                      INFO       <360.00> Total reward: {}
2025-12-03 22:24:00,689 comm.communication             INFO       <360.00> Optimizing data communication between all pairs of satellites
2025-12-03 22:24:00,690 sats.satellite.Scanner-1       INFO       <360.00> Scanner-1: Satellite Scanner-1 requires retasking
2025-12-03 22:24:00,692 gym                            INFO       <360.00> Step reward: 0.0
2025-12-03 22:24:00,692 gym                            INFO       <360.00> === STARTING STEP ===
2025-12-03 22:24:00,692 sats.satellite.Scanner-1       INFO       <360.00> Scanner-1: action_charge tasked for 120.0 seconds
2025-12-03 22:24:00,693 sats.satellite.Scanner-1       INFO       <360.00> Scanner-1: setting timed terminal event at 480.0
2025-12-03 22:24:00,701 sats.satellite.Scanner-1       INFO       <480.00> Scanner-1: timed termination at 480.0 for action_charge
2025-12-03 22:24:00,702 data.base                      INFO       <480.00> Total reward: {}
2025-12-03 22:24:00,702 comm.communication             INFO       <480.00> Optimizing data communication between all pairs of satellites
2025-12-03 22:24:00,703 sats.satellite.Scanner-1       INFO       <480.00> Scanner-1: Satellite Scanner-1 requires retasking
2025-12-03 22:24:00,705 gym                            INFO       <480.00> Step reward: 0.0
2025-12-03 22:24:00,706 gym                            INFO       <480.00> === STARTING STEP ===
2025-12-03 22:24:00,707 sats.satellite.Scanner-1       INFO       <480.00> Scanner-1: action_charge tasked for 120.0 seconds
2025-12-03 22:24:00,707 sats.satellite.Scanner-1       INFO       <480.00> Scanner-1: setting timed terminal event at 600.0
2025-12-03 22:24:00,715 sats.satellite.Scanner-1       INFO       <600.00> Scanner-1: timed termination at 600.0 for action_charge
2025-12-03 22:24:00,715 data.base                      INFO       <600.00> Total reward: {}
2025-12-03 22:24:00,716 comm.communication             INFO       <600.00> Optimizing data communication between all pairs of satellites
2025-12-03 22:24:00,717 sats.satellite.Scanner-1       INFO       <600.00> Scanner-1: Satellite Scanner-1 requires retasking
2025-12-03 22:24:00,718 gym                            INFO       <600.00> Step reward: 0.0
2025-12-03 22:24:00,719 gym                            INFO       <600.00> === STARTING STEP ===
2025-12-03 22:24:00,720 sats.satellite.Scanner-1       INFO       <600.00> Scanner-1: action_downlink tasked for 60.0 seconds
2025-12-03 22:24:00,720 sats.satellite.Scanner-1       INFO       <600.00> Scanner-1: setting timed terminal event at 660.0
2025-12-03 22:24:00,725 sats.satellite.Scanner-1       INFO       <660.00> Scanner-1: timed termination at 660.0 for action_downlink
2025-12-03 22:24:00,726 data.base                      INFO       <660.00> Total reward: {}
2025-12-03 22:24:00,726 comm.communication             INFO       <660.00> Optimizing data communication between all pairs of satellites
2025-12-03 22:24:00,727 sats.satellite.Scanner-1       INFO       <660.00> Scanner-1: Satellite Scanner-1 requires retasking
2025-12-03 22:24:00,729 gym                            INFO       <660.00> Step reward: 0.0
2025-12-03 22:24:00,729 gym                            INFO       <660.00> === STARTING STEP ===
2025-12-03 22:24:00,730 sats.satellite.Scanner-1       INFO       <660.00> Scanner-1: action_desat tasked for 60.0 seconds
2025-12-03 22:24:00,730 sats.satellite.Scanner-1       INFO       <660.00> Scanner-1: setting timed terminal event at 720.0
2025-12-03 22:24:00,735 sats.satellite.Scanner-1       INFO       <720.00> Scanner-1: timed termination at 720.0 for action_desat
2025-12-03 22:24:00,736 data.base                      INFO       <720.00> Total reward: {}
2025-12-03 22:24:00,736 comm.communication             INFO       <720.00> Optimizing data communication between all pairs of satellites
2025-12-03 22:24:00,737 sats.satellite.Scanner-1       INFO       <720.00> Scanner-1: Satellite Scanner-1 requires retasking
2025-12-03 22:24:00,739 gym                            INFO       <720.00> Step reward: 0.0
2025-12-03 22:24:00,739 gym                            INFO       <720.00> === STARTING STEP ===
2025-12-03 22:24:00,740 sats.satellite.Scanner-1       INFO       <720.00> Scanner-1: action_downlink tasked for 60.0 seconds
2025-12-03 22:24:00,740 sats.satellite.Scanner-1       INFO       <720.00> Scanner-1: setting timed terminal event at 780.0
2025-12-03 22:24:00,745 sats.satellite.Scanner-1       INFO       <780.00> Scanner-1: timed termination at 780.0 for action_downlink
2025-12-03 22:24:00,745 data.base                      INFO       <780.00> Total reward: {}
2025-12-03 22:24:00,746 comm.communication             INFO       <780.00> Optimizing data communication between all pairs of satellites
2025-12-03 22:24:00,746 sats.satellite.Scanner-1       INFO       <780.00> Scanner-1: Satellite Scanner-1 requires retasking
2025-12-03 22:24:00,748 gym                            INFO       <780.00> Step reward: 0.0
2025-12-03 22:24:00,749 gym                            INFO       <780.00> === STARTING STEP ===
2025-12-03 22:24:00,750 sats.satellite.Scanner-1       INFO       <780.00> Scanner-1: action_nadir_scan tasked for 180.0 seconds
2025-12-03 22:24:00,750 sats.satellite.Scanner-1       INFO       <780.00> Scanner-1: setting timed terminal event at 960.0
2025-12-03 22:24:00,761 sats.satellite.Scanner-1       INFO       <960.00> Scanner-1: timed termination at 960.0 for action_nadir_scan
2025-12-03 22:24:00,762 data.base                      INFO       <960.00> Total reward: {'Scanner-1': 0.0027368421052631577}
2025-12-03 22:24:00,762 comm.communication             INFO       <960.00> Optimizing data communication between all pairs of satellites
2025-12-03 22:24:00,763 sats.satellite.Scanner-1       INFO       <960.00> Scanner-1: Satellite Scanner-1 requires retasking
2025-12-03 22:24:00,765 gym                            INFO       <960.00> Step reward: 0.0027368421052631577
2025-12-03 22:24:00,765 gym                            INFO       <960.00> === STARTING STEP ===
2025-12-03 22:24:00,766 sats.satellite.Scanner-1       INFO       <960.00> Scanner-1: action_downlink tasked for 60.0 seconds
2025-12-03 22:24:00,766 sats.satellite.Scanner-1       INFO       <960.00> Scanner-1: setting timed terminal event at 1020.0
2025-12-03 22:24:00,771 sats.satellite.Scanner-1       INFO       <1020.00> Scanner-1: timed termination at 1020.0 for action_downlink
2025-12-03 22:24:00,771 data.base                      INFO       <1020.00> Total reward: {}
2025-12-03 22:24:00,772 comm.communication             INFO       <1020.00> Optimizing data communication between all pairs of satellites
2025-12-03 22:24:00,772 sats.satellite.Scanner-1       INFO       <1020.00> Scanner-1: Satellite Scanner-1 requires retasking
2025-12-03 22:24:00,774 gym                            INFO       <1020.00> Step reward: 0.0
2025-12-03 22:24:00,775 gym                            INFO       <1020.00> === STARTING STEP ===
2025-12-03 22:24:00,775 sats.satellite.Scanner-1       INFO       <1020.00> Scanner-1: action_desat tasked for 60.0 seconds
2025-12-03 22:24:00,776 sats.satellite.Scanner-1       INFO       <1020.00> Scanner-1: setting timed terminal event at 1080.0
2025-12-03 22:24:00,780 sats.satellite.Scanner-1       INFO       <1080.00> Scanner-1: timed termination at 1080.0 for action_desat
2025-12-03 22:24:00,781 data.base                      INFO       <1080.00> Total reward: {}
2025-12-03 22:24:00,782 comm.communication             INFO       <1080.00> Optimizing data communication between all pairs of satellites
2025-12-03 22:24:00,782 sats.satellite.Scanner-1       INFO       <1080.00> Scanner-1: Satellite Scanner-1 requires retasking
2025-12-03 22:24:00,784 gym                            INFO       <1080.00> Step reward: 0.0
2025-12-03 22:24:00,784 gym                            INFO       <1080.00> === STARTING STEP ===
2025-12-03 22:24:00,785 sats.satellite.Scanner-1       INFO       <1080.00> Scanner-1: action_nadir_scan tasked for 180.0 seconds
2025-12-03 22:24:00,785 sats.satellite.Scanner-1       INFO       <1080.00> Scanner-1: setting timed terminal event at 1260.0
2025-12-03 22:24:00,796 sats.satellite.Scanner-1       INFO       <1260.00> Scanner-1: timed termination at 1260.0 for action_nadir_scan
2025-12-03 22:24:00,797 data.base                      INFO       <1260.00> Total reward: {'Scanner-1': 0.005859649122807017}
2025-12-03 22:24:00,797 comm.communication             INFO       <1260.00> Optimizing data communication between all pairs of satellites
2025-12-03 22:24:00,798 sats.satellite.Scanner-1       INFO       <1260.00> Scanner-1: Satellite Scanner-1 requires retasking
2025-12-03 22:24:00,799 gym                            INFO       <1260.00> Step reward: 0.005859649122807017
2025-12-03 22:24:00,800 gym                            INFO       <1260.00> === STARTING STEP ===
2025-12-03 22:24:00,800 sats.satellite.Scanner-1       INFO       <1260.00> Scanner-1: action_nadir_scan tasked for 180.0 seconds
2025-12-03 22:24:00,801 sats.satellite.Scanner-1       INFO       <1260.00> Scanner-1: setting timed terminal event at 1440.0
2025-12-03 22:24:00,812 sats.satellite.Scanner-1       INFO       <1440.00> Scanner-1: timed termination at 1440.0 for action_nadir_scan
2025-12-03 22:24:00,812 data.base                      INFO       <1440.00> Total reward: {'Scanner-1': 0.00631578947368421}
2025-12-03 22:24:00,813 comm.communication             INFO       <1440.00> Optimizing data communication between all pairs of satellites
2025-12-03 22:24:00,813 sats.satellite.Scanner-1       INFO       <1440.00> Scanner-1: Satellite Scanner-1 requires retasking
2025-12-03 22:24:00,815 gym                            INFO       <1440.00> Step reward: 0.00631578947368421
2025-12-03 22:24:00,816 gym                            INFO       <1440.00> === STARTING STEP ===
2025-12-03 22:24:00,816 sats.satellite.Scanner-1       INFO       <1440.00> Scanner-1: action_nadir_scan tasked for 180.0 seconds
2025-12-03 22:24:00,817 sats.satellite.Scanner-1       INFO       <1440.00> Scanner-1: setting timed terminal event at 1620.0
2025-12-03 22:24:00,828 sats.satellite.Scanner-1       INFO       <1620.00> Scanner-1: timed termination at 1620.0 for action_nadir_scan
2025-12-03 22:24:00,828 data.base                      INFO       <1620.00> Total reward: {'Scanner-1': 0.00631578947368421}
2025-12-03 22:24:00,829 comm.communication             INFO       <1620.00> Optimizing data communication between all pairs of satellites
2025-12-03 22:24:00,829 sats.satellite.Scanner-1       INFO       <1620.00> Scanner-1: Satellite Scanner-1 requires retasking
2025-12-03 22:24:00,831 gym                            INFO       <1620.00> Step reward: 0.00631578947368421
2025-12-03 22:24:00,832 gym                            INFO       <1620.00> === STARTING STEP ===
2025-12-03 22:24:00,832 sats.satellite.Scanner-1       INFO       <1620.00> Scanner-1: action_downlink tasked for 60.0 seconds
2025-12-03 22:24:00,833 sats.satellite.Scanner-1       INFO       <1620.00> Scanner-1: setting timed terminal event at 1680.0
2025-12-03 22:24:00,838 sats.satellite.Scanner-1       INFO       <1680.00> Scanner-1: timed termination at 1680.0 for action_downlink
2025-12-03 22:24:00,838 data.base                      INFO       <1680.00> Total reward: {}
2025-12-03 22:24:00,839 comm.communication             INFO       <1680.00> Optimizing data communication between all pairs of satellites
2025-12-03 22:24:00,839 sats.satellite.Scanner-1       INFO       <1680.00> Scanner-1: Satellite Scanner-1 requires retasking
2025-12-03 22:24:00,841 gym                            INFO       <1680.00> Step reward: 0.0
2025-12-03 22:24:00,842 gym                            INFO       <1680.00> === STARTING STEP ===
2025-12-03 22:24:00,842 sats.satellite.Scanner-1       INFO       <1680.00> Scanner-1: action_nadir_scan tasked for 180.0 seconds
2025-12-03 22:24:00,843 sats.satellite.Scanner-1       INFO       <1680.00> Scanner-1: setting timed terminal event at 1860.0
2025-12-03 22:24:00,854 sats.satellite.Scanner-1       INFO       <1860.00> Scanner-1: timed termination at 1860.0 for action_nadir_scan
2025-12-03 22:24:00,854 data.base                      INFO       <1860.00> Total reward: {'Scanner-1': 0.0005263157894736842}
2025-12-03 22:24:00,855 comm.communication             INFO       <1860.00> Optimizing data communication between all pairs of satellites
2025-12-03 22:24:00,855 sats.satellite.Scanner-1       INFO       <1860.00> Scanner-1: Satellite Scanner-1 requires retasking
2025-12-03 22:24:00,857 gym                            INFO       <1860.00> Step reward: 0.0005263157894736842
2025-12-03 22:24:00,858 gym                            INFO       <1860.00> === STARTING STEP ===
2025-12-03 22:24:00,858 sats.satellite.Scanner-1       INFO       <1860.00> Scanner-1: action_charge tasked for 120.0 seconds
2025-12-03 22:24:00,859 sats.satellite.Scanner-1       INFO       <1860.00> Scanner-1: setting timed terminal event at 1980.0
2025-12-03 22:24:00,867 sats.satellite.Scanner-1       INFO       <1980.00> Scanner-1: timed termination at 1980.0 for action_charge
2025-12-03 22:24:00,867 data.base                      INFO       <1980.00> Total reward: {}
2025-12-03 22:24:00,868 comm.communication             INFO       <1980.00> Optimizing data communication between all pairs of satellites
2025-12-03 22:24:00,868 sats.satellite.Scanner-1       INFO       <1980.00> Scanner-1: Satellite Scanner-1 requires retasking
2025-12-03 22:24:00,870 gym                            INFO       <1980.00> Step reward: 0.0
2025-12-03 22:24:00,870 gym                            INFO       <1980.00> === STARTING STEP ===
2025-12-03 22:24:00,871 sats.satellite.Scanner-1       INFO       <1980.00> Scanner-1: action_downlink tasked for 60.0 seconds
2025-12-03 22:24:00,871 sats.satellite.Scanner-1       INFO       <1980.00> Scanner-1: setting timed terminal event at 2040.0
2025-12-03 22:24:00,876 sats.satellite.Scanner-1       INFO       <2040.00> Scanner-1: timed termination at 2040.0 for action_downlink
2025-12-03 22:24:00,877 data.base                      INFO       <2040.00> Total reward: {}
2025-12-03 22:24:00,877 comm.communication             INFO       <2040.00> Optimizing data communication between all pairs of satellites
2025-12-03 22:24:00,878 sats.satellite.Scanner-1       INFO       <2040.00> Scanner-1: Satellite Scanner-1 requires retasking
2025-12-03 22:24:00,879 gym                            INFO       <2040.00> Step reward: 0.0
2025-12-03 22:24:00,880 gym                            INFO       <2040.00> === STARTING STEP ===
2025-12-03 22:24:00,881 sats.satellite.Scanner-1       INFO       <2040.00> Scanner-1: action_desat tasked for 60.0 seconds
2025-12-03 22:24:00,881 sats.satellite.Scanner-1       INFO       <2040.00> Scanner-1: setting timed terminal event at 2100.0
2025-12-03 22:24:00,886 sats.satellite.Scanner-1       INFO       <2100.00> Scanner-1: timed termination at 2100.0 for action_desat
2025-12-03 22:24:00,887 data.base                      INFO       <2100.00> Total reward: {}
2025-12-03 22:24:00,887 comm.communication             INFO       <2100.00> Optimizing data communication between all pairs of satellites
2025-12-03 22:24:00,888 sats.satellite.Scanner-1       INFO       <2100.00> Scanner-1: Satellite Scanner-1 requires retasking
2025-12-03 22:24:00,889 gym                            INFO       <2100.00> Step reward: 0.0
2025-12-03 22:24:00,890 gym                            INFO       <2100.00> === STARTING STEP ===
2025-12-03 22:24:00,890 sats.satellite.Scanner-1       INFO       <2100.00> Scanner-1: action_charge tasked for 120.0 seconds
2025-12-03 22:24:00,890 sats.satellite.Scanner-1       INFO       <2100.00> Scanner-1: setting timed terminal event at 2220.0
2025-12-03 22:24:00,899 sats.satellite.Scanner-1       INFO       <2220.00> Scanner-1: timed termination at 2220.0 for action_charge
2025-12-03 22:24:00,900 data.base                      INFO       <2220.00> Total reward: {}
2025-12-03 22:24:00,900 comm.communication             INFO       <2220.00> Optimizing data communication between all pairs of satellites
2025-12-03 22:24:00,901 sats.satellite.Scanner-1       INFO       <2220.00> Scanner-1: Satellite Scanner-1 requires retasking
2025-12-03 22:24:00,903 gym                            INFO       <2220.00> Step reward: 0.0
2025-12-03 22:24:00,903 gym                            INFO       <2220.00> === STARTING STEP ===
2025-12-03 22:24:00,904 sats.satellite.Scanner-1       INFO       <2220.00> Scanner-1: action_charge tasked for 120.0 seconds
2025-12-03 22:24:00,904 sats.satellite.Scanner-1       INFO       <2220.00> Scanner-1: setting timed terminal event at 2340.0
2025-12-03 22:24:00,912 sats.satellite.Scanner-1       INFO       <2340.00> Scanner-1: timed termination at 2340.0 for action_charge
2025-12-03 22:24:00,912 data.base                      INFO       <2340.00> Total reward: {}
2025-12-03 22:24:00,913 comm.communication             INFO       <2340.00> Optimizing data communication between all pairs of satellites
2025-12-03 22:24:00,913 sats.satellite.Scanner-1       INFO       <2340.00> Scanner-1: Satellite Scanner-1 requires retasking
2025-12-03 22:24:00,915 gym                            INFO       <2340.00> Step reward: 0.0
2025-12-03 22:24:00,916 gym                            INFO       <2340.00> === STARTING STEP ===
2025-12-03 22:24:00,916 sats.satellite.Scanner-1       INFO       <2340.00> Scanner-1: action_nadir_scan tasked for 180.0 seconds
2025-12-03 22:24:00,917 sats.satellite.Scanner-1       INFO       <2340.00> Scanner-1: setting timed terminal event at 2520.0
2025-12-03 22:24:00,928 sats.satellite.Scanner-1       INFO       <2520.00> Scanner-1: timed termination at 2520.0 for action_nadir_scan
2025-12-03 22:24:00,928 data.base                      INFO       <2520.00> Total reward: {'Scanner-1': 0.0031228070175438596}
2025-12-03 22:24:00,929 comm.communication             INFO       <2520.00> Optimizing data communication between all pairs of satellites
2025-12-03 22:24:00,930 sats.satellite.Scanner-1       INFO       <2520.00> Scanner-1: Satellite Scanner-1 requires retasking
2025-12-03 22:24:00,931 gym                            INFO       <2520.00> Step reward: 0.0031228070175438596
2025-12-03 22:24:00,932 gym                            INFO       <2520.00> === STARTING STEP ===
2025-12-03 22:24:00,933 sats.satellite.Scanner-1       INFO       <2520.00> Scanner-1: action_charge tasked for 120.0 seconds
2025-12-03 22:24:00,933 sats.satellite.Scanner-1       INFO       <2520.00> Scanner-1: setting timed terminal event at 2640.0
2025-12-03 22:24:00,941 sats.satellite.Scanner-1       INFO       <2640.00> Scanner-1: timed termination at 2640.0 for action_charge
2025-12-03 22:24:00,942 data.base                      INFO       <2640.00> Total reward: {}
2025-12-03 22:24:00,942 comm.communication             INFO       <2640.00> Optimizing data communication between all pairs of satellites
2025-12-03 22:24:00,943 sats.satellite.Scanner-1       INFO       <2640.00> Scanner-1: Satellite Scanner-1 requires retasking
2025-12-03 22:24:00,944 gym                            INFO       <2640.00> Step reward: 0.0
2025-12-03 22:24:00,945 gym                            INFO       <2640.00> === STARTING STEP ===
2025-12-03 22:24:00,945 sats.satellite.Scanner-1       INFO       <2640.00> Scanner-1: action_downlink tasked for 60.0 seconds
2025-12-03 22:24:00,946 sats.satellite.Scanner-1       INFO       <2640.00> Scanner-1: setting timed terminal event at 2700.0
2025-12-03 22:24:00,951 sats.satellite.Scanner-1       INFO       <2700.00> Scanner-1: timed termination at 2700.0 for action_downlink
2025-12-03 22:24:00,952 data.base                      INFO       <2700.00> Total reward: {}
2025-12-03 22:24:00,952 comm.communication             INFO       <2700.00> Optimizing data communication between all pairs of satellites
2025-12-03 22:24:00,953 sats.satellite.Scanner-1       INFO       <2700.00> Scanner-1: Satellite Scanner-1 requires retasking
2025-12-03 22:24:00,955 gym                            INFO       <2700.00> Step reward: 0.0
2025-12-03 22:24:00,955 gym                            INFO       <2700.00> === STARTING STEP ===
2025-12-03 22:24:00,956 sats.satellite.Scanner-1       INFO       <2700.00> Scanner-1: action_desat tasked for 60.0 seconds
2025-12-03 22:24:00,956 sats.satellite.Scanner-1       INFO       <2700.00> Scanner-1: setting timed terminal event at 2760.0
2025-12-03 22:24:00,961 sats.satellite.Scanner-1       INFO       <2760.00> Scanner-1: timed termination at 2760.0 for action_desat
2025-12-03 22:24:00,961 data.base                      INFO       <2760.00> Total reward: {}
2025-12-03 22:24:00,962 comm.communication             INFO       <2760.00> Optimizing data communication between all pairs of satellites
2025-12-03 22:24:00,962 sats.satellite.Scanner-1       INFO       <2760.00> Scanner-1: Satellite Scanner-1 requires retasking
2025-12-03 22:24:00,964 gym                            INFO       <2760.00> Step reward: 0.0
2025-12-03 22:24:00,964 gym                            INFO       <2760.00> === STARTING STEP ===
2025-12-03 22:24:00,965 sats.satellite.Scanner-1       INFO       <2760.00> Scanner-1: action_nadir_scan tasked for 180.0 seconds
2025-12-03 22:24:00,965 sats.satellite.Scanner-1       INFO       <2760.00> Scanner-1: setting timed terminal event at 2940.0
2025-12-03 22:24:00,977 sats.satellite.Scanner-1       INFO       <2940.00> Scanner-1: timed termination at 2940.0 for action_nadir_scan
2025-12-03 22:24:00,977 data.base                      INFO       <2940.00> Total reward: {'Scanner-1': 0.0007719298245614035}
2025-12-03 22:24:00,978 comm.communication             INFO       <2940.00> Optimizing data communication between all pairs of satellites
2025-12-03 22:24:00,978 sats.satellite.Scanner-1       INFO       <2940.00> Scanner-1: Satellite Scanner-1 requires retasking
2025-12-03 22:24:00,980 sats.satellite.Scanner-1       WARNING    <2940.00> Scanner-1: failed battery_valid check
2025-12-03 22:24:00,980 gym                            INFO       <2940.00> Step reward: -0.9992280701754386
2025-12-03 22:24:00,981 gym                            INFO       <2940.00> Episode terminated: True
2025-12-03 22:24:00,982 gym                            INFO       <2940.00> Episode truncated: False