Robust high performance reinforcement learning through weighted k-nearest neighbors

The aim of this paper is to present (jointly) a series of robust high performance (award winning) implementations of reinforcement learning algorithms based on temporal-difference learning and weighted k- nearest neighbors for linear function approximation. These algorithms, named kNN@?TD(@l) methods, where rigorously tested at the Second and Third Annual Reinforcement Learning Competitions (RLC2008 and RCL2009) held in Helsinki and Montreal respectively, where the kNN@?TD(@l) method (JAMH team) won in the PolyAthlon 2008 domain, obtained the second place in 2009 and also the second place in the Mountain-Car 2008 domain showing that it is one of the state of the art general purpose reinforcement learning implementations. These algorithms are able to learn quickly, to generalize properly over continuous state spaces and also to be robust to a high degree of environmental noise. Furthermore, we describe a derivation of kNN@?TD(@l) algorithm for problems where the use of continuous actions have clear advantages over the use of fine grained discrete actions: the Ex reinforcement learning algorithm.

[1]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[2]  Pieter Abbeel,et al.  Learning vehicular dynamics, with application to modeling helicopters , 2005, NIPS.

[3]  Richard S. Sutton,et al.  Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.

[4]  Javier de Lope,et al.  The kNN-TD Reinforcement Learning Algorithm , 2009 .

[5]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[6]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[7]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[8]  H. Martín,et al.  Ex〈α〉: An effective algorithm for continuous actions Reinforcement Learning problems , 2009 .

[9]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[10]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[11]  Sahibsingh A. Dudani The Distance-Weighted k-Nearest-Neighbor Rule , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[14]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[15]  H. Martín,et al.  A k-NN based perception scheme for reinforcement learning , 2007 .

[16]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[17]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[18]  Lihong Li,et al.  Workshop summary: Results of the 2009 reinforcement learning competition , 2009, ICML '09.