Semi-MDP Discounting in RLlib

bsk_rl.utils.rllib.discounting is a collection of utilities for semi-MDP style discounting.

See the following examples for how to use these utilities:

class ContinuePreviousAction(*args, **kwargs)[source]

Bases: ConnectorV2

Override actions with NO_ACTION on connector pass if the agent does not require retasking.

class MakeAddedStepActionValid(*args, expected_train_batch_size, **kwargs)[source]

Bases: ConnectorV2

Ensure that padded steps are not duplicates of NO_ACTION steps.

class CondenseMultiStepActions(*args, **kwargs)[source]

Bases: ConnectorV2

Combine steps that used NO_ACTION on connector pass.

class TimeDiscountedGAEPPOTorchLearner(*args, **kwargs)[source]

Bases: PPOTorchLearner, TimeDiscountedGAEPPOLearner

Discount episodes according to the d_ts value in the info dictionary.

class TimeDiscountedGAEPPOTfLearner(*args, **kwargs)[source]

Bases: PPOTfLearner, TimeDiscountedGAEPPOLearner

Discount episodes according to the d_ts value in the info dictionary.