Q-networks
¶
deer.base_classes.QNetwork (environment, ...) |
All the Q-networks classes should inherit this interface. |
deer.q_networks.q_net_theano.MyQNetwork (...) |
Deep Q-learning network using Theano |
deer.q_networks.q_net_lasagne.MyQNetwork |
Detailed description¶
-
class
deer.base_classes.
QNetwork
(environment, batchSize)¶ All the Q-networks classes should inherit this interface.
Parameters: environment : object from class Environment
The environment linked to the Q-network
batch_size : int
Number of tuples taken into account for each iteration of gradient descent
Methods
chooseBestAction
(state)Get the best action for a belief state discountFactor
()Getting the discount factor learningRate
()Getting the learning rate qValues
(state)Get the q value for one belief state setDiscountFactor
(df)Setting the discount factor setLearningRate
(lr)Setting the learning rate train
(states, actions, rewards, nextStates, ...)This method performs the Bellman iteration for one batch of tuples. -
chooseBestAction
(state)¶ Get the best action for a belief state
-
discountFactor
()¶ Getting the discount factor
-
learningRate
()¶ Getting the learning rate
-
qValues
(state)¶ Get the q value for one belief state
-
setDiscountFactor
(df)¶ Setting the discount factor
Parameters: df : float
The discount factor that has to bet set
-
setLearningRate
(lr)¶ Setting the learning rate
Parameters: lr : float
The learning rate that has to bet set
-
train
(states, actions, rewards, nextStates, terminals)¶ This method performs the Bellman iteration for one batch of tuples.
-
-
class
deer.q_networks.q_net_theano.
MyQNetwork
(environment, rho, rms_epsilon, momentum, clip_delta, freeze_interval, batchSize, network_type, update_rule, batch_accumulator, randomState, DoubleQ=False, TheQNet=<class 'deer.q_networks.NN_theano.NN'>)¶ Bases:
deer.base_classes.QNetwork.QNetwork
Deep Q-learning network using Theano
Parameters: environment : object from class Environment
rho : float
rms_epsilon : float
momentum : float
clip_delta : float
freeze_interval : int
batch_size : int
Number of tuples taken into account for each iteration of gradient descent
network_type : str
update_rule: str
batch_accumulator : str
randomState : numpy random number generator
DoubleQ : bool, optional
Activate or not the DoubleQ learning, default : False. More informations in : Hado van Hasselt et al. (2015) - Deep Reinforcement Learning with Double Q-learning.
TheQNet : object, optional
default is deer.qnetworks.NN_theano
Methods
chooseBestAction
discountFactor
learningRate
qValues
setDiscountFactor
setLearningRate
toDump
train
-
chooseBestAction
(state)¶ Get the best action for a belief state
Returns: The best action : int
-
qValues
(state_val)¶ Get the q value for one belief state
Returns: The q value for the provided belief state
-
train
(states_val, actions_val, rewards_val, next_states_val, terminals_val)¶ Train one batch.
- Set shared variable in states_shared, next_states_shared, actions_shared, rewards_shared, terminals_shared
- perform batch training
Parameters: states_val : list of batch_size * [list of max_num_elements* [list of k * [element 2D,1D or scalar]])
actions_val : b x 1 numpy array of integers
rewards_val : b x 1 numpy array
next_states_val : list of batch_size * [list of max_num_elements* [list of k * [element 2D,1D or scalar]])
terminals_val : b x 1 numpy boolean array (currently ignored)
Returns: average loss of the batch training
-