In this project, we explore the domain of modelbased reinforcement learning applied to playing Atari games from images. Recently, search-based methods such as AlphaGo have proven to be effective for complex and long-term planning in games such as Go. In a general setting, we do not have a perfect model of an environment such as in Go, but recent work has also explored building predictive models for Atari games using video frames. This project thus proposes an integrated methodology that employs model-based control in conjunction with a deeper investigation of existing search algorithms. We used the Arcade Learning Environment (ALE) to perform both data collection and evaluation, and implemented both a model-free policy gradient agent and a predictive model based on convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to serve as a model for a planning algorithm based on tree search. We evaluate our methods against a human benchmark and DQN on two simple games. While we were unable to outperform DQN, we were able to surpass human performance in Pong using the policy gradient method and MCTS. Our results demonstrate that MCTS shows promise and potential for planning in reinforcement learning tasks.
[1]
Stefan Schaal,et al.
2008 Special Issue: Reinforcement learning of motor skills with policy gradients
,
2008
.
[2]
Alex Graves,et al.
Playing Atari with Deep Reinforcement Learning
,
2013,
ArXiv.
[3]
Honglak Lee,et al.
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning
,
2014,
NIPS.
[4]
Yoshua Bengio,et al.
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
,
2015,
ICML.
[5]
Sergey Levine,et al.
Trust Region Policy Optimization
,
2015,
ICML.
[6]
Shane Legg,et al.
Human-level control through deep reinforcement learning
,
2015,
Nature.
[7]
Honglak Lee,et al.
Action-Conditional Video Prediction using Deep Networks in Atari Games
,
2015,
NIPS.
[8]
Demis Hassabis,et al.
Mastering the game of Go with deep neural networks and tree search
,
2016,
Nature.
[9]
Sergey Levine,et al.
End-to-End Training of Deep Visuomotor Policies
,
2015,
J. Mach. Learn. Res..