Controller

This file defines the base Controller class and some presets controllers that you can use for controlling the training and the various parameters of your agents.

Controllers can be attached to an agent using the agent’s attach(Controller) method. The order in which controllers are attached matters. Indeed, if controllers C1, C2 and C3 were attached in this order and C1 and C3 both listen to the onEpisodeEnd signal, the onEpisodeEnd() method of C1 will be called before the onEpisodeEnd() method of C3, whenever an episode ends.

Controller() A base controller that does nothing when receiving the various signals emitted by an agent.
LearningRateController([...]) A controller that modifies the learning rate periodically upon epochs end.
EpsilonController([initial_e, e_decays, ...]) A controller that modifies the probability “epsilon” of taking a random action periodically.
DiscountFactorController([...]) A controller that modifies the q-network discount periodically.
TrainerController([evaluate_on, ...]) A controller that makes the agent train on its current database periodically.
InterleavedTestEpochController([id, ...]) A controller that interleaves a test epoch between training epochs of the agent.
FindBestController([validationID, testID, ...]) A controller that finds the neural net performing at best in validation mode (i.e.

Detailed description

class deer.experiment.base_controllers.Controller

A base controller that does nothing when receiving the various signals emitted by an agent. This class should be the base class of any controller you would want to define.

Methods

onActionChosen(agent, action) Called whenever the agent has chosen an action.
onActionTaken(agent) Called whenever the agent has taken an action on its environment.
onEnd(agent) Called when the agent has finished processing all its epochs, just before returning from its run() method.
onEpisodeEnd(agent, terminal_reached, reward) Called whenever the agent ends an episode, just after this episode ended and before any onEpochEnd() signal could be sent.
onEpochEnd(agent) Called whenever the agent ends an epoch, just after the last episode of this epoch was ended and after any onEpisodeEnd() signal was processed.
onStart(agent) Called when the agent is going to start working (before anything else).
setActive(active) Activate or deactivate this controller.
onActionChosen(agent, action)

Called whenever the agent has chosen an action.

This occurs after the agent state was updated with the new observation it made, but before it applied this action on the environment and before the total reward is updated.

onActionTaken(agent)

Called whenever the agent has taken an action on its environment.

This occurs after the agent applied this action on the environment and before terminality is evaluated. This is called only once, even in the case where the agent skip frames by taking the same action multiple times. In other words, this occurs just before the next observation of the environment.

onEnd(agent)

Called when the agent has finished processing all its epochs, just before returning from its run() method.

onEpisodeEnd(agent, terminal_reached, reward)

Called whenever the agent ends an episode, just after this episode ended and before any onEpochEnd() signal could be sent.

Parameters:

agent : NeuralAgent

The agent firing the event

terminal_reached : bool

Whether the episode ended because a terminal transition occured. This could be False if the episode was stopped because its step budget was exhausted.

reward : float

The reward obtained on the last transition performed in this episode.

onEpochEnd(agent)

Called whenever the agent ends an epoch, just after the last episode of this epoch was ended and after any onEpisodeEnd() signal was processed.

Parameters:

agent : NeuralAgent

The agent firing the event

onStart(agent)

Called when the agent is going to start working (before anything else).

This corresponds to the moment where the agent’s run() method is called.

Parameters:

agent : NeuralAgent

The agent firing the event

setActive(active)

Activate or deactivate this controller.

A controller should not react to any signal it receives as long as it is deactivated. For instance, if a controller maintains a counter on how many episodes it has seen, this counter should not be updated when this controller is disabled.

class deer.experiment.base_controllers.LearningRateController(initial_learning_rate=0.005, learning_rate_decay=1.0, periodicity=1)

Bases: deer.experiment.base_controllers.Controller

A controller that modifies the learning rate periodically upon epochs end.

Parameters:

initial_learning_rate : float

The learning rate upon agent start

learning_rate_decay : float

The factor by which the previous learning rate is multiplied every [periodicity] epochs.

periodicity : int

How many epochs are necessary before an update of the learning rate occurs

Methods

onActionChosen(agent, action) Called whenever the agent has chosen an action.
onActionTaken(agent) Called whenever the agent has taken an action on its environment.
onEnd(agent) Called when the agent has finished processing all its epochs, just before returning from its run() method.
onEpisodeEnd(agent, terminal_reached, reward) Called whenever the agent ends an episode, just after this episode ended and before any onEpochEnd() signal could be sent.
onEpochEnd(agent)
onStart(agent)
setActive(active) Activate or deactivate this controller.
class deer.experiment.base_controllers.EpsilonController(initial_e=1.0, e_decays=10000, e_min=0.1, evaluate_on='action', periodicity=1, reset_every='none')

Bases: deer.experiment.base_controllers.Controller

A controller that modifies the probability “epsilon” of taking a random action periodically.

Parameters:

initial_e : float

Start epsilon

e_decays : int

How many updates are necessary for epsilon to reach eMin

e_min : float

End epsilon

evaluate_on : str

After what type of event epsilon shoud be updated periodically. Possible values: ‘action’, ‘episode’, ‘epoch’.

periodicity : int

How many [evaluateOn] are necessary before an update of epsilon occurs

reset_every : str

After what type of event epsilon should be reset to its initial value. Possible values: ‘none’, ‘episode’, ‘epoch’.

Methods

onActionChosen(agent, action)
onActionTaken(agent) Called whenever the agent has taken an action on its environment.
onEnd(agent) Called when the agent has finished processing all its epochs, just before returning from its run() method.
onEpisodeEnd(agent, terminal_reached, reward)
onEpochEnd(agent)
onStart(agent)
setActive(active) Activate or deactivate this controller.
class deer.experiment.base_controllers.DiscountFactorController(initial_discount_factor=0.9, discount_factor_growth=1.0, discount_factor_max=0.99, periodicity=1)

Bases: deer.experiment.base_controllers.Controller

A controller that modifies the q-network discount periodically. More informations in : Francois-Lavet Vincent et al. (2015) - How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies (http://arxiv.org/abs/1512.02011).

Parameters:

initial_discount_factor : float

Start discount

discount_factor_growth : float

The factor by which the previous discount is multiplied every [periodicity] epochs.

discount_factor_max : float

Maximum reachable discount

periodicity : int

How many training epochs are necessary before an update of the discount occurs

Methods

onActionChosen(agent, action) Called whenever the agent has chosen an action.
onActionTaken(agent) Called whenever the agent has taken an action on its environment.
onEnd(agent) Called when the agent has finished processing all its epochs, just before returning from its run() method.
onEpisodeEnd(agent, terminal_reached, reward) Called whenever the agent ends an episode, just after this episode ended and before any onEpochEnd() signal could be sent.
onEpochEnd(agent)
onStart(agent)
setActive(active) Activate or deactivate this controller.
class deer.experiment.base_controllers.TrainerController(evaluate_on='action', periodicity=1, show_episode_avg_V_value=True, show_avg_Bellman_residual=True)

Bases: deer.experiment.base_controllers.Controller

A controller that makes the agent train on its current database periodically.

Parameters:

evaluate_on : str

After what type of event the agent shoud be trained periodically. Possible values: ‘action’, ‘episode’, ‘epoch’. The first training will occur after the first occurence of [evaluateOn].

periodicity : int

How many [evaluateOn] are necessary before a training occurs _show_avg_Bellman_residual [bool] - Whether to show an informative message after each episode end (and after a training if [evaluateOn] is ‘episode’) about the average bellman residual of this episode

show_episode_avg_V_value : bool

Whether to show an informative message after each episode end (and after a training if [evaluateOn] is ‘episode’) about the average V value of this episode

Methods

onActionChosen(agent, action) Called whenever the agent has chosen an action.
onActionTaken(agent)
onEnd(agent) Called when the agent has finished processing all its epochs, just before returning from its run() method.
onEpisodeEnd(agent, terminal_reached, reward)
onEpochEnd(agent)
onStart(agent)
setActive(active) Activate or deactivate this controller.
class deer.experiment.base_controllers.InterleavedTestEpochController(id=0, epoch_length=500, controllers_to_disable=[], periodicity=2, show_score=True, summarize_every=10)

Bases: deer.experiment.base_controllers.Controller

A controller that interleaves a test epoch between training epochs of the agent.

Parameters:

id : int

The identifier (>= 0) of the mode each test epoch triggered by this controller will belong to. Can be used to discriminate between datasets in your Environment subclass (this is the argument that will be given to your environment’s reset() method when starting the test epoch).

epoch_length : float

The total number of transitions that will occur during a test epoch. This means that this epoch could feature several episodes if a terminal transition is reached before this budget is exhausted.

controllers_to_disable : list of int

A list of controllers to disable when this controller wants to start a test epoch. These same controllers will be reactivated after this controller has finished dealing with its test epoch.

periodicity : int

How many epochs are necessary before a test epoch is ran (these controller’s epochs included: “1 test epoch on [periodicity] epochs”). Minimum value: 2.

show_score : bool

Whether to print an informative message on stdout at the end of each test epoch, about the total reward obtained in the course of the test epoch.

summarize_every : int

How many of this controller’s test epochs are necessary before the attached agent’s summarizeTestPerformance() method is called. Give a value <= 0 for “never”. If > 0, the first call will occur just after the first test epoch.

Methods

onActionChosen(agent, action) Called whenever the agent has chosen an action.
onActionTaken(agent) Called whenever the agent has taken an action on its environment.
onEnd(agent) Called when the agent has finished processing all its epochs, just before returning from its run() method.
onEpisodeEnd(agent, terminal_reached, reward) Called whenever the agent ends an episode, just after this episode ended and before any onEpochEnd() signal could be sent.
onEpochEnd(agent)
onStart(agent)
setActive(active) Activate or deactivate this controller.
class deer.experiment.base_controllers.FindBestController(validationID=0, testID=None, unique_fname='nnet')

Bases: deer.experiment.base_controllers.Controller

A controller that finds the neural net performing at best in validation mode (i.e. for mode = [validationID]) and computes the associated generalization score in test mode (i.e. for mode = [testID], and this only if [testID] is different from None). This controller should never be disabled by InterleavedTestControllers as it is meant to work in conjunction with them.

At each epoch end where this controller is active, it will look at the current mode the agent is in.

If the mode matches [validationID], it will take the total reward of the agent on this epoch and compare it to its current best score. If it is better, it will ask the agent to dump its current nnet on disk and update its current best score. In all cases, it saves the validation score obtained in a vector.

If the mode matches [testID], it saves the test (= generalization) score in another vector. Note that if [testID] is None, no test mode score are ever recorded.

At the end of the experiment (onEnd), if active, this controller will print information about the epoch at which the best neural net was found together with its generalization score, this last information shown only if [testID] is different from None. Finally it will dump a dictionnary containing the data of the plots ({n: number of epochs elapsed, ts: test scores, vs: validation scores}). Note that if [testID] is None, the value dumped for the ‘ts’ key is [].

Parameters:

validationID : int

See synopsis

testID : int

See synopsis

unique_fname : str

A unique filename (basename for score and network dumps).

Methods

onActionChosen(agent, action) Called whenever the agent has chosen an action.
onActionTaken(agent) Called whenever the agent has taken an action on its environment.
onEnd(agent)
onEpisodeEnd(agent, terminal_reached, reward) Called whenever the agent ends an episode, just after this episode ended and before any onEpochEnd() signal could be sent.
onEpochEnd(agent)
onStart(agent) Called when the agent is going to start working (before anything else).
setActive(active) Activate or deactivate this controller.