Over the last several years deep learning algorithms have met with dramatic successes across a wide range of application areas. The recently introduced deep Q-learning algorithm represents the first convincing combination of deep learning with reinforcement learning. The algorithm is able to learn policies for Atari 2600 games that approach or exceed human performance. The work presented here introduces an open-source implementation of the deep Q-learning algorithm and explores the impact of a number of key hyper-parameters on the algorithm’s success. The results suggest that, at least for some games, the algorithm is very sensitive to hyper-parameter selection. Within a narrow-window of values the algorithm reliably learns high-quality policies. Outside of that narrow window, learning is unsuccessful. This brittleness in the face of hyper-parameter selection may make it difficult to extend the use deep Q-learning beyond the Atari 2600 domain.
[1]
Andrew W. Moore,et al.
Generalization in Reinforcement Learning: Safely Approximating the Value Function
,
1994,
NIPS.
[2]
Sebastian Thrun,et al.
Issues in Using Function Approximation for Reinforcement Learning
,
1999
.
[3]
Razvan Pascanu,et al.
Theano: new features and speed improvements
,
2012,
ArXiv.
[4]
Alex Graves,et al.
Playing Atari with Deep Reinforcement Learning
,
2013,
ArXiv.
[5]
Shane Legg,et al.
Human-level control through deep reinforcement learning
,
2015,
Nature.
[6]
Marc G. Bellemare,et al.
The Arcade Learning Environment: An Evaluation Platform for General Agents
,
2012,
J. Artif. Intell. Res..