`Q-networks`¶

`deer.base_classes.QNetwork`(environment, ...)	All the Q-networks classes should inherit this interface.
`deer.q_networks.q_net_theano.MyQNetwork`(...)	Deep Q-learning network using Theano
`deer.q_networks.q_net_lasagne.MyQNetwork`

Detailed description¶

class deer.base_classes.QNetwork(environment, batchSize)¶

All the Q-networks classes should inherit this interface.

Parameters:

environment : object from class Environment

The environment linked to the Q-network

batch_size : int

Number of tuples taken into account for each iteration of gradient descent

Methods

`chooseBestAction`(state)	Get the best action for a belief state
`discountFactor`()	Getting the discount factor
`learningRate`()	Getting the learning rate
`qValues`(state)	Get the q value for one belief state
`setDiscountFactor`(df)	Setting the discount factor
`setLearningRate`(lr)	Setting the learning rate
`train`(states, actions, rewards, nextStates, ...)	This method performs the Bellman iteration for one batch of tuples.

chooseBestAction(state)¶: Get the best action for a belief state

discountFactor()¶: Getting the discount factor

learningRate()¶: Getting the learning rate

qValues(state)¶: Get the q value for one belief state

setDiscountFactor(df)¶

Setting the discount factor

Parameters:

df : float

The discount factor that has to bet set

setLearningRate(lr)¶

Setting the learning rate

Parameters:

lr : float

The learning rate that has to bet set

train(states, actions, rewards, nextStates, terminals)¶: This method performs the Bellman iteration for one batch of tuples.

class deer.q_networks.q_net_theano.MyQNetwork(environment, rho, rms_epsilon, momentum, clip_delta, freeze_interval, batchSize, network_type, update_rule, batch_accumulator, randomState, DoubleQ=False, TheQNet=<class 'deer.q_networks.NN_theano.NN'>)¶

Bases: deer.base_classes.QNetwork.QNetwork

Deep Q-learning network using Theano

Parameters:

environment : object from class Environment

rho : float

rms_epsilon : float

momentum : float

clip_delta : float

freeze_interval : int

batch_size : int

Number of tuples taken into account for each iteration of gradient descent

network_type : str

update_rule: str

batch_accumulator : str

randomState : numpy random number generator

DoubleQ : bool, optional

Activate or not the DoubleQ learning, default : False. More informations in : Hado van Hasselt et al. (2015) - Deep Reinforcement Learning with Double Q-learning.

TheQNet : object, optional

default is deer.qnetworks.NN_theano

Methods

`chooseBestAction`
`discountFactor`
`learningRate`
`qValues`
`setDiscountFactor`
`setLearningRate`
`toDump`
`train`

chooseBestAction(state)¶

Get the best action for a belief state

Returns:	The best action : int

qValues(state_val)¶

Get the q value for one belief state

Returns:	The q value for the provided belief state

train(states_val, actions_val, rewards_val, next_states_val, terminals_val)¶

Train one batch.

Set shared variable in states_shared, next_states_shared, actions_shared, rewards_shared, terminals_shared
perform batch training

Parameters:

states_val : list of batch_size * [list of max_num_elements* [list of k * [element 2D,1D or scalar]])

actions_val : b x 1 numpy array of integers

rewards_val : b x 1 numpy array

next_states_val : list of batch_size * [list of max_num_elements* [list of k * [element 2D,1D or scalar]])

terminals_val : b x 1 numpy boolean array (currently ignored)

Returns:

average loss of the batch training

Q-networks¶

Detailed description¶

`Q-networks`¶