Policies

deer.base_classes.Policy(learning_algo, …) Abstract class for all policies.
deer.policies.EpsilonGreedyPolicy(…) The policy acts greedily with probability \(1-\epsilon\) and acts randomly otherwise.
deer.policies.LongerExplorationPolicy(…[, …]) Simple alternative to \(\epsilon\)-greedy that can explore more efficiently for a broad class of realistic problems.
class deer.base_classes.Policy(learning_algo, n_actions, random_state)

Abstract class for all policies. A policy takes observations as input, and outputs an action.

learning_algo : object from class LearningALgo n_actions : int or list

Definition of the action space provided by Environment.nActions()

random_state : numpy random number generator

action(state)

Main method of the Policy class. It can be called by agent.py, given a state, and should return a valid action w.r.t. the environment given to the constructor.

bestAction(state, mode=None, *args, **kwargs)

Returns the best Action for the given state. This is an additional encapsulation for q-network.

randomAction()

Returns a random action

class deer.policies.EpsilonGreedyPolicy(learning_algo, n_actions, random_state, epsilon)

Bases: deer.base_classes.policy.Policy

The policy acts greedily with probability \(1-\epsilon\) and acts randomly otherwise. It is now used as a default policy for the neural agent.

epsilon : float
Proportion of random steps
action(state, mode=None, *args, **kwargs)

Main method of the Policy class. It can be called by agent.py, given a state, and should return a valid action w.r.t. the environment given to the constructor.

epsilon()

Get the epsilon for \(\epsilon\)-greedy exploration

setEpsilon(e)

Set the epsilon used for \(\epsilon\)-greedy exploration

class deer.policies.LongerExplorationPolicy(learning_algo, n_actions, random_state, epsilon, length=10)

Bases: deer.base_classes.policy.Policy

Simple alternative to \(\epsilon\)-greedy that can explore more efficiently for a broad class of realistic problems.

epsilon : float
Proportion of random steps
length : int
Length of the exploration sequences that will be considered
action(state, mode=None, *args, **kwargs)

Main method of the Policy class. It can be called by agent.py, given a state, and should return a valid action w.r.t. the environment given to the constructor.

epsilon()

Get the epsilon

setEpsilon(e)

Set the epsilon