Policies
¶
deer.base_classes.Policy (learning_algo, …) |
Abstract class for all policies. |
deer.policies.EpsilonGreedyPolicy (…) |
The policy acts greedily with probability \(1-\epsilon\) and acts randomly otherwise. |
deer.policies.LongerExplorationPolicy (…[, …]) |
Simple alternative to \(\epsilon\)-greedy that can explore more efficiently for a broad class of realistic problems. |
-
class
deer.base_classes.
Policy
(learning_algo, n_actions, random_state)¶ Abstract class for all policies. A policy takes observations as input, and outputs an action.
learning_algo : object from class LearningALgo n_actions : int or list
Definition of the action space provided by Environment.nActions()random_state : numpy random number generator
-
action
(state)¶ Main method of the Policy class. It can be called by agent.py, given a state, and should return a valid action w.r.t. the environment given to the constructor.
-
bestAction
(state, mode=None, *args, **kwargs)¶ Returns the best Action for the given state. This is an additional encapsulation for q-network.
-
randomAction
()¶ Returns a random action
-
-
class
deer.policies.
EpsilonGreedyPolicy
(learning_algo, n_actions, random_state, epsilon)¶ Bases:
deer.base_classes.policy.Policy
The policy acts greedily with probability \(1-\epsilon\) and acts randomly otherwise. It is now used as a default policy for the neural agent.
- epsilon : float
- Proportion of random steps
-
action
(state, mode=None, *args, **kwargs)¶ Main method of the Policy class. It can be called by agent.py, given a state, and should return a valid action w.r.t. the environment given to the constructor.
-
epsilon
()¶ Get the epsilon for \(\epsilon\)-greedy exploration
-
setEpsilon
(e)¶ Set the epsilon used for \(\epsilon\)-greedy exploration
-
class
deer.policies.
LongerExplorationPolicy
(learning_algo, n_actions, random_state, epsilon, length=10)¶ Bases:
deer.base_classes.policy.Policy
Simple alternative to \(\epsilon\)-greedy that can explore more efficiently for a broad class of realistic problems.
- epsilon : float
- Proportion of random steps
- length : int
- Length of the exploration sequences that will be considered
-
action
(state, mode=None, *args, **kwargs)¶ Main method of the Policy class. It can be called by agent.py, given a state, and should return a valid action w.r.t. the environment given to the constructor.
-
epsilon
()¶ Get the epsilon
-
setEpsilon
(e)¶ Set the epsilon