Semi-MDP Discounting in RLlib
bsk_rl.utils.rllib.discounting
is a collection of utilities for semi-MDP style discounting.
See the following examples for how to use these utilities:
Time-Discounted GAE - An example of
TimeDiscountedGAEPPOTorchLearner
in a single-agent case.Asynchronous Multiagent Decision Making - An example of the time discounted learner and connectors for asynchronous multi-agent training (
ContinuePreviousAction
,MakeAddedStepActionValid
, andCondenseMultiStepActions
).
- class ContinuePreviousAction(*args, **kwargs)[source]
Bases:
ConnectorV2
Override actions with
NO_ACTION
on connector pass if the agent does not require retasking.
- class MakeAddedStepActionValid(*args, expected_train_batch_size, **kwargs)[source]
Bases:
ConnectorV2
Ensure that padded steps are not duplicates of
NO_ACTION
steps.
- class CondenseMultiStepActions(*args, **kwargs)[source]
Bases:
ConnectorV2
Combine steps that used
NO_ACTION
on connector pass.