Reinforcement Learning in Video Games Using Nearest Neighbor Interpolation and Metric Learning

Reinforcement learning (RL) has had mixed success when applied to games. Large state spaces and the curse of dimensionality have limited the ability for RL techniques to learn to play complex games in a reasonable length of time. We discuss a modification of Q-learning to use nearest neighbor states to exploit previous experience in the early stages of learning. A weighting on the state features is learned using metric learning techniques, such that neighboring states represent similar game situations. Our method is tested on the arcade game Frogger, and it is shown that some of the effects of the curse of dimensionality can be mitigated.

[1]  Austin J. Brockmeier,et al.  Neural Decoding with Kernel-Based Metric Learning , 2014, Neural Computation.

[2]  Sebastian Thrun,et al.  Learning to Play the Game of Chess , 1994, NIPS.

[3]  David G. Lowe,et al.  Similarity Metric Learning for a Variable-Kernel Classifier , 1995, Neural Computation.

[4]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[5]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[6]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[7]  W. Marsden I and J , 2012 .

[8]  Dirk Ormoneit,et al.  Kernel-Based Reinforcement Learning , 2017, Encyclopedia of Machine Learning and Data Mining.

[9]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[10]  Mehryar Mohri,et al.  Algorithms for Learning Kernels Based on Centered Alignment , 2012, J. Mach. Learn. Res..

[11]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[12]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[13]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[14]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  Andrea Lockerd Thomaz,et al.  Automatic State Abstraction from Demonstration , 2011, IJCAI.

[17]  A. Brockmeier,et al.  Metric Learning for Invariant Feature Generation in Reinforcement Learning , 2013 .

[18]  Samuel Wintermute,et al.  Using Imagery to Simplify Perceptual Abstraction in Reinforcement Learning Agents , 2010, AAAI.

[19]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[20]  Andrew Tridgell,et al.  Learning to Play Chess Using Temporal Differences , 2000, Machine Learning.

[21]  Martin A. Riedmiller,et al.  CBR for State Value Function Approximation in Reinforcement Learning , 2005, ICCBR.

[22]  Juyang Weng,et al.  Covert Perceptual Capability Development , 2005 .

[23]  H. Robbins A Stochastic Approximation Method , 1951 .

[24]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.