Q-networks

deer.base_classes.QNetwork(environment, ...) All the Q-networks classes should inherit this interface.
deer.q_networks.q_net_theano.MyQNetwork(...) Deep Q-learning network using Theano
deer.q_networks.q_net_lasagne.MyQNetwork

Detailed description

class deer.base_classes.QNetwork(environment, batchSize)

All the Q-networks classes should inherit this interface.

Parameters:

environment : object from class Environment

The environment linked to the Q-network

batch_size : int

Number of tuples taken into account for each iteration of gradient descent

Methods

chooseBestAction(state) Get the best action for a belief state
discountFactor() Getting the discount factor
learningRate() Getting the learning rate
qValues(state) Get the q value for one belief state
setDiscountFactor(df) Setting the discount factor
setLearningRate(lr) Setting the learning rate
train(states, actions, rewards, nextStates, ...) This method performs the Bellman iteration for one batch of tuples.
chooseBestAction(state)

Get the best action for a belief state

discountFactor()

Getting the discount factor

learningRate()

Getting the learning rate

qValues(state)

Get the q value for one belief state

setDiscountFactor(df)

Setting the discount factor

Parameters:

df : float

The discount factor that has to bet set

setLearningRate(lr)

Setting the learning rate

Parameters:

lr : float

The learning rate that has to bet set

train(states, actions, rewards, nextStates, terminals)

This method performs the Bellman iteration for one batch of tuples.

class deer.q_networks.q_net_theano.MyQNetwork(environment, rho, rms_epsilon, momentum, clip_delta, freeze_interval, batchSize, network_type, update_rule, batch_accumulator, randomState, DoubleQ=False, TheQNet=<class 'deer.q_networks.NN_theano.NN'>)

Bases: deer.base_classes.QNetwork.QNetwork

Deep Q-learning network using Theano

Parameters:

environment : object from class Environment

rho : float

rms_epsilon : float

momentum : float

clip_delta : float

freeze_interval : int

batch_size : int

Number of tuples taken into account for each iteration of gradient descent

network_type : str

update_rule: str

batch_accumulator : str

randomState : numpy random number generator

DoubleQ : bool, optional

Activate or not the DoubleQ learning, default : False. More informations in : Hado van Hasselt et al. (2015) - Deep Reinforcement Learning with Double Q-learning.

TheQNet : object, optional

default is deer.qnetworks.NN_theano

Methods

chooseBestAction
discountFactor
learningRate
qValues
setDiscountFactor
setLearningRate
toDump
train
chooseBestAction(state)

Get the best action for a belief state

Returns:The best action : int
qValues(state_val)

Get the q value for one belief state

Returns:The q value for the provided belief state
train(states_val, actions_val, rewards_val, next_states_val, terminals_val)

Train one batch.

  1. Set shared variable in states_shared, next_states_shared, actions_shared, rewards_shared, terminals_shared
  2. perform batch training
Parameters:

states_val : list of batch_size * [list of max_num_elements* [list of k * [element 2D,1D or scalar]])

actions_val : b x 1 numpy array of integers

rewards_val : b x 1 numpy array

next_states_val : list of batch_size * [list of max_num_elements* [list of k * [element 2D,1D or scalar]])

terminals_val : b x 1 numpy boolean array (currently ignored)

Returns:

average loss of the batch training