Model-Based Reinforcement Learning for Playing Atari Games

In this project, we explore the domain of modelbased reinforcement learning applied to playing Atari games from images. Recently, search-based methods such as AlphaGo have proven to be effective for complex and long-term planning in games such as Go. In a general setting, we do not have a perfect model of an environment such as in Go, but recent work has also explored building predictive models for Atari games using video frames. This project thus proposes an integrated methodology that employs model-based control in conjunction with a deeper investigation of existing search algorithms. We used the Arcade Learning Environment (ALE) to perform both data collection and evaluation, and implemented both a model-free policy gradient agent and a predictive model based on convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to serve as a model for a planning algorithm based on tree search. We evaluate our methods against a human benchmark and DQN on two simple games. While we were unable to outperform DQN, we were able to surpass human performance in Pong using the policy gradient method and MCTS. Our results demonstrate that MCTS shows promise and potential for planning in reinforcement learning tasks.