Implementing Deep Q-Learning using Tensorflow

10 May

Implementing Deep Q-Learning using Tensorflow

This article will demonstrate how to do reinforcement learning on a larger environment than previously demonstrated. We will be implementing Deep Q-Learning technique using Tensorflow.

Note: A graphics rendering library is required for the following demonstration. For Windows operating system, PyOpenGl is suggested while for Ubuntu operating system, OpenGl is recommended.

Step 1: Importing the required libraries

import numpy as np
import gym

from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten
from keras.optimizers import Adam

from rl.agents.dqn import DQNAgent
from rl.policy import EpsGreedyQPolicy
from rl.memory import SequentialMemory
Step 2: Building the Environment

Note: A preloaded environment will be used from OpenAI’s gym module which contains many different environments for different purposes. The list of environments can be viewed from their website.

Here, the ‘MountainCar-v0’ environment will be used. In this, a car(the agent) is stuck between two mountains and has to drive uphill on one of them. The car’s engine is not strong enough to drive up on it’s own and thus the car has to build momentum to get uphill

# Building the environment
environment_name = 'MountainCar-v0'
env = gym.make(environment_name)
np.random.seed(0)
env.seed(0)

# Extracting the number of possible actions
num_actions = env.action_space.n
Step 3: Building the learning agent

The learning agent will be built using a deep neural network and for the same purpose, we will be using the Sequential class of the Keras module.

agent = Sequential()
agent.add(Flatten(input_shape =(1, ) + env.observation_space.shape))
agent.add(Dense(16))
agent.add(Activation('relu'))
agent.add(Dense(num_actions))
agent.add(Activation('linear'))
Step 4: Finding the Optimal Strategy

# Building the model to find the optimal strategy
strategy = EpsGreedyQPolicy()
memory = SequentialMemory(limit = 10000, window_length = 1)
dqn = DQNAgent(model = agent, nb_actions = num_actions,
memory = memory, nb_steps_warmup = 10,
target_model_update = 1e-2, policy = strategy)
dqn.compile(Adam(lr = 1e-3), metrics =['mae'])

# Visualizing the training
dqn.fit(env, nb_steps = 5000, visualize = True, verbose = 2)
Video Player

00:00
00:18

The agent tries different methods to reach the top and thus gaining knowledge from each episode.

Step 5: Testing the Learning Agent

# Testing the learning agent
dqn.test(env, nb_episodes = 5, visualize = True)
Video Player

00:00
00:14

The agent tries to apply it’s knowledge to reach the top.

Course Curriculum

Implementing Deep Q-Learning using Tensorflow