# Controller¶

This file defines the base Controller class and some presets controllers that you can use for controlling the training and the various parameters of your agents.

Controllers can be attached to an agent using the agent’s attach(Controller) method. The order in which controllers are attached matters. Indeed, if controllers C1, C2 and C3 were attached in this order and C1 and C3 both listen to the onEpisodeEnd signal, the onEpisodeEnd() method of C1 will be called before the onEpisodeEnd() method of C3, whenever an episode ends.

 Controller() A base controller that does nothing when receiving the various signals emitted by an agent. LearningRateController([…]) A controller that modifies the learning rate periodically upon epochs end. EpsilonController([initial_e, e_decays, …]) A controller that modifies the probability “epsilon” of taking a random action periodically. DiscountFactorController([…]) A controller that modifies the q-network discount periodically. TrainerController([evaluate_on, …]) A controller that makes the agent train on its current database periodically. InterleavedTestEpochController([id, …]) A controller that interleaves a test epoch between training epochs of the agent. FindBestController([validationID, testID, …]) A controller that finds the neural net performing at best in validation mode (i.e.
class deer.experiment.base_controllers.Controller

A base controller that does nothing when receiving the various signals emitted by an agent. This class should be the base class of any controller you would want to define.

onActionChosen(agent, action)

Called whenever the agent has chosen an action.

This occurs after the agent state was updated with the new observation it made, but before it applied this action on the environment and before the total reward is updated.

onActionTaken(agent)

Called whenever the agent has taken an action on its environment.

This occurs after the agent applied this action on the environment and before terminality is evaluated. This is called only once, even in the case where the agent skip frames by taking the same action multiple times. In other words, this occurs just before the next observation of the environment.

onEnd(agent)

Called when the agent has finished processing all its epochs, just before returning from its run() method.

onEpisodeEnd(agent, terminal_reached, reward)

Called whenever the agent ends an episode, just after this episode ended and before any onEpochEnd() signal could be sent.

agent : NeuralAgent
The agent firing the event
terminal_reached : bool
Whether the episode ended because a terminal transition occured. This could be False if the episode was stopped because its step budget was exhausted.
reward : float
The reward obtained on the last transition performed in this episode.
onEpochEnd(agent)

Called whenever the agent ends an epoch, just after the last episode of this epoch was ended and after any onEpisodeEnd() signal was processed.

agent : NeuralAgent
The agent firing the event
onStart(agent)

Called when the agent is going to start working (before anything else).

This corresponds to the moment where the agent’s run() method is called.

agent : NeuralAgent
The agent firing the event
setActive(active)

Activate or deactivate this controller.

A controller should not react to any signal it receives as long as it is deactivated. For instance, if a controller maintains a counter on how many episodes it has seen, this counter should not be updated when this controller is disabled.

class deer.experiment.base_controllers.LearningRateController(initial_learning_rate=0.005, learning_rate_decay=1.0, periodicity=1)

A controller that modifies the learning rate periodically upon epochs end.

initial_learning_rate : float
The learning rate upon agent start
learning_rate_decay : float
The factor by which the previous learning rate is multiplied every [periodicity] epochs.
periodicity : int
How many epochs are necessary before an update of the learning rate occurs
class deer.experiment.base_controllers.EpsilonController(initial_e=1.0, e_decays=10000, e_min=0.1, evaluate_on='action', periodicity=1, reset_every='none')

A controller that modifies the probability “epsilon” of taking a random action periodically.

initial_e : float
Start epsilon
e_decays : int
How many updates are necessary for epsilon to reach eMin
e_min : float
End epsilon
evaluate_on : str
After what type of event epsilon shoud be updated periodically. Possible values: ‘action’, ‘episode’, ‘epoch’.
periodicity : int
How many [evaluateOn] are necessary before an update of epsilon occurs
reset_every : str
After what type of event epsilon should be reset to its initial value. Possible values: ‘none’, ‘episode’, ‘epoch’.
class deer.experiment.base_controllers.DiscountFactorController(initial_discount_factor=0.9, discount_factor_growth=1.0, discount_factor_max=0.99, periodicity=1)

A controller that modifies the q-network discount periodically. More informations in : Francois-Lavet Vincent et al. (2015) - How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies (http://arxiv.org/abs/1512.02011).

initial_discount_factor : float
Start discount
discount_factor_growth : float
The factor by which the previous discount is multiplied every [periodicity] epochs.
discount_factor_max : float
Maximum reachable discount
periodicity : int
How many training epochs are necessary before an update of the discount occurs
class deer.experiment.base_controllers.TrainerController(evaluate_on='action', periodicity=1, show_episode_avg_V_value=True, show_avg_Bellman_residual=True)

A controller that makes the agent train on its current database periodically.

evaluate_on : str
After what type of event the agent shoud be trained periodically. Possible values: ‘action’, ‘episode’, ‘epoch’. The first training will occur after the first occurence of [evaluateOn].
periodicity : int
How many [evaluateOn] are necessary before a training occurs _show_avg_Bellman_residual [bool] - Whether to show an informative message after each episode end (and after a training if [evaluateOn] is ‘episode’) about the average bellman residual of this episode
show_episode_avg_V_value : bool
Whether to show an informative message after each episode end (and after a training if [evaluateOn] is ‘episode’) about the average V value of this episode
class deer.experiment.base_controllers.InterleavedTestEpochController(id=0, epoch_length=500, controllers_to_disable=[], periodicity=2, show_score=True, summarize_every=10)

A controller that interleaves a test epoch between training epochs of the agent.

id : int
The identifier (>= 0) of the mode each test epoch triggered by this controller will belong to. Can be used to discriminate between datasets in your Environment subclass (this is the argument that will be given to your environment’s reset() method when starting the test epoch).
epoch_length : float
The total number of transitions that will occur during a test epoch. This means that this epoch could feature several episodes if a terminal transition is reached before this budget is exhausted.
controllers_to_disable : list of int
A list of controllers to disable when this controller wants to start a test epoch. These same controllers will be reactivated after this controller has finished dealing with its test epoch.
periodicity : int
How many epochs are necessary before a test epoch is ran (these controller’s epochs included: “1 test epoch on [periodicity] epochs”). Minimum value: 2.
show_score : bool
Whether to print an informative message on stdout at the end of each test epoch, about the total reward obtained in the course of the test epoch.
summarize_every : int
How many of this controller’s test epochs are necessary before the attached agent’s summarizeTestPerformance() method is called. Give a value <= 0 for “never”. If > 0, the first call will occur just after the first test epoch.
class deer.experiment.base_controllers.FindBestController(validationID=0, testID=None, unique_fname='nnet')

A controller that finds the neural net performing at best in validation mode (i.e. for mode = [validationID]) and computes the associated generalization score in test mode (i.e. for mode = [testID], and this only if [testID] is different from None). This controller should never be disabled by InterleavedTestControllers as it is meant to work in conjunction with them.

At each epoch end where this controller is active, it will look at the current mode the agent is in.

If the mode matches [validationID], it will take the total reward of the agent on this epoch and compare it to its current best score. If it is better, it will ask the agent to dump its current nnet on disk and update its current best score. In all cases, it saves the validation score obtained in a vector.

If the mode matches [testID], it saves the test (= generalization) score in another vector. Note that if [testID] is None, no test mode score are ever recorded.

At the end of the experiment (onEnd), if active, this controller will print information about the epoch at which the best neural net was found together with its generalization score, this last information shown only if [testID] is different from None. Finally it will dump a dictionnary containing the data of the plots ({n: number of epochs elapsed, ts: test scores, vs: validation scores}). Note that if [testID] is None, the value dumped for the ‘ts’ key is [].

validationID : int
See synopsis
testID : int
See synopsis
unique_fname : str
A unique filename (basename for score and network dumps).