Similarity-Aware Kanerva Coding for On-Line Reinforcement Learning

A major challenge in reinforcement learning (RL) is use of a tabular representation to represent learned policies with a large number of states or state-action pairs. Function approximation is a promising tool to overcome this deficiency. This approach uses parameterized functions instead of a table to represent learned knowledge and enables generalization. However, existing schemes cannot solve realistic RL problems, with their rapidly increasing demands for approximating accuracy and efficiency. In this paper, we extend the architecture of Sparse Distributed Memories (SDMs) and propose a novel on-line methodology, similarity-aware Kanerva coding (SAK), that closely represents the learned knowledge for very large-scale problems with significantly fewer parameterized components. SAK directly measures the state variables’ real distances in all dimensions and reformulates a new state similarity metric with an improved definition of state closeness. As a result, our scheme accurately distributes and generalizes knowledge among related states. We further enhance SAK’s efficiency by allowing a limited number of prototype states that have certain similarities to be activated for value approximation so that the risk of over-generalization is hindered. In addition, SAK eliminates size tuning and prototype reallocation for the prototype set, resulting in not only broadened scalability but also significant savings in the amount of necessary prototypes and computational overhead needed for RL. Our extensive experimental results show that SAK achieves more than 48% improvements over existing schemes in learning quality, and reveal that SAK is able to consistently learn good policies for RL with small overhead and short training times, even given roughly tuned scheme parameters.

[1]  Pentti Kanerva,et al.  Sparse distributed memory and related models , 1993 .

[2]  Doina Precup,et al.  Sparse Distributed Memories in Reinforcement Learning : Case Studies , 2004 .

[3]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[5]  S. Whiteson,et al.  Adaptive Tile Coding for Value Function Approximation , 2007 .

[6]  Cheng Wu,et al.  Fuzzy Kanerva-based function approximation for reinforcement learning , 2009, AAMAS.

[7]  Stephen Lin,et al.  Evolutionary Tile Coding: An Automated State Abstraction Algorithm for Reinforcement Learning , 2010, Abstraction, Reformulation, and Approximation.

[8]  Martin Allen,et al.  Reinforcement learning with adaptive Kanerva coding for Xpilot game AI , 2011, 2011 IEEE Congress of Evolutionary Computation (CEC).

[9]  Risto Miikkulainen,et al.  HyperNEAT-GGP: a hyperNEAT-based atari general game player , 2012, GECCO '12.

[10]  Matthieu Geist,et al.  Algorithmic Survey of Parametric Value Function Approximation , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Filip De Turck,et al.  Design of a Q-learning-based client quality selection algorithm for HTTP adaptive video streaming , 2013, ALA 2013.

[12]  Fan Zhou,et al.  Learning-Based and Data-Driven TCP Design for Memory-Constrained IoT , 2016, 2016 International Conference on Distributed Computing in Sensor Systems (DCOSS).

[13]  Federico Chiariotti,et al.  Online learning adaptation strategy for DASH clients , 2016, MMSys.

[14]  Patrick M. Pilarski,et al.  Representing high-dimensional data to intelligent prostheses and other wearable assistive robots: A first comparison of tile coding and selective Kanerva coding , 2017, 2017 International Conference on Rehabilitation Robotics (ICORR).

[15]  Wei Li,et al.  Adaptive Adjacency Kanerva Coding for Memory-Constrained Reinforcement Learning , 2018, MLDM.