`Policies`¶

`deer.base_classes.Policy`(q_network, ...)	Abstract class for all policies.
`deer.policies.EpsilonGreedyPolicy`(q_network, ...)	The policy acts greedily with probability \(1-\epsilon\) and acts randomly otherwise.
`deer.policies.LongerExplorationPolicy`(...[, ...])	Simple alternative to \(\epsilon\)-greedy that can explore more efficiently for a broad class of realistic problems.

Detailed description¶

class deer.base_classes.Policy(q_network, n_actions, random_state)¶

Abstract class for all policies. A policy takes observations as input, and outputs an action.

Parameters:

q_network : object from class QNetwork

n_actions : int or list

Definition of the action space provided by Environment.nActions()

random_state : numpy random number generator

Methods

action(state)¶: Main method of the Policy class. It can be called by agent.py, given a state, and should return a valid action w.r.t. the environment given to the constructor.

bestAction(state)¶: Returns the best Action for the given state. This is an additional encapsulation for q-network.

class deer.policies.EpsilonGreedyPolicy(q_network, n_actions, random_state, epsilon)¶

Bases: deer.base_classes.Policy.Policy

The policy acts greedily with probability \(1-\epsilon\) and acts randomly otherwise. It is now used as a default policy for the neural agent.

Parameters:

epsilon : float

Proportion of random steps

Methods

class deer.policies.LongerExplorationPolicy(q_network, n_actions, random_state, epsilon, length=10)¶

Bases: deer.base_classes.Policy.Policy

Simple alternative to \(\epsilon\)-greedy that can explore more efficiently for a broad class of realistic problems.

Parameters:

epsilon : float

Proportion of random steps

length : int

Length of the exploration sequences that will be considered

Methods

`action`
`bestAction`
`epsilon`
`randomAction`
`sampleUniformActionSequence`
`setEpsilon`