Semi-MDP Discounting in RLlib
bsk_rl.utils.rllib.discounting is a collection of utilities for semi-MDP style discounting.
See the following examples for how to use these utilities:
Time-Discounted GAE - An example of
TimeDiscountedGAEPPOTorchLearnerin a single-agent case.Asynchronous Multiagent Decision Making - An example of the time discounted learner and connectors for asynchronous multi-agent training (
ContinuePreviousAction,MakeAddedStepActionValid, andCondenseMultiStepActions).
- class ContinuePreviousAction(*args, **kwargs)[source]
Bases:
ConnectorV2Override actions with
NO_ACTIONon connector pass if the agent does not require retasking.
- class MakeAddedStepActionValid(*args, expected_train_batch_size, **kwargs)[source]
Bases:
ConnectorV2Ensure that padded steps are not duplicates of
NO_ACTIONsteps.
- class CondenseMultiStepActions(*args, **kwargs)[source]
Bases:
ConnectorV2Combine steps that used
NO_ACTIONon connector pass.
- class TimeDiscountedGAEPPOTorchLearner(*args, **kwargs)[source]
Bases:
PPOTorchLearner,TimeDiscountedGAEPPOLearnerDiscount episodes according to the
d_tsvalue in the info dictionary.
- class TimeDiscountedGAEPPOTfLearner(*args, **kwargs)[source]
Bases:
PPOTfLearner,TimeDiscountedGAEPPOLearnerDiscount episodes according to the
d_tsvalue in the info dictionary.