Novel Function Approximation Techniques for Large-scale Reinforcement Learning

Function approximation can be used to improve the performance of reinforcement learners. Traditional techniques, including Tile Coding and Kanerva Coding, can give poor performance when applied to large-scale problems. In our preliminary work, we show that this poor performance is caused by prototype collisions and uneven prototype visit frequency distributions. We describe our adaptive Kanerva-based function approximation algorithm, based on dynamic prototype allocation and adaptation. We show that probabilistic prototype deletion with prototype splitting can make the distribution of visit frequencies more uniform, and that dynamic prototype allocation and adaptation can reduce prototoype collsisions. This approach can significantly improve the performance of a reinforcement learner. We then show that fuzzy Kanerva-based function approximation can reduce the similarity between the membership vectors of state-action pairs, giving even better results. We use Maximum Likelihood Estimation to adjust the variances of basis functions and tune the receptive fields of prototypes. This approach completely eliminates prototype collisions, and greatly improve the ability of a Kanerva-based reinforcement learner to solve large-scale problems. Since the number of prototypes remains hard to select, we describe a more effective approach for adaptively selecting the number of prototypes. Our new rough sets-based Kanerva-based function approximation uses rough sets theory to explain how prototype

[1]  Cheng Wu,et al.  Spectrum management of cognitive radio using multi-agent reinforcement learning , 2010, AAMAS.

[2]  Joseph Mitola Cognitive Radio for Flexible Mobile Multimedia Communications , 2001, Mob. Networks Appl..

[3]  Sampath Kannan,et al.  Randomized Pursuit-Evasion with Local Visibility , 2006, SIAM J. Discret. Math..

[4]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[5]  A. E. Eiben,et al.  Introduction to Evolutionary Computing , 2003, Natural Computing Series.

[6]  J. Mitola,et al.  Cognitive radio for flexible mobile multimedia communications , 1999, 1999 IEEE International Workshop on Mobile Multimedia Communications (MoMuC'99) (Cat. No.99EX384).

[7]  Thomas Bäck,et al.  Evolutionary algorithms in theory and practice - evolution strategies, evolutionary programming, genetic algorithms , 1996 .

[8]  Geoffrey E. Hinton,et al.  Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[9]  Micah Adler,et al.  Randomized Pursuit-Evasion in Graphs , 2002, Combinatorics, Probability and Computing.

[10]  Pentti Kanerva,et al.  Sparse distributed memory and related models , 1993 .

[11]  Ian F. Akyildiz,et al.  NeXt generation/dynamic spectrum access/cognitive radio wireless networks: A survey , 2006, Comput. Networks.

[12]  J. Bigham,et al.  Fuzzy Sarsa : An approach to fuzzifying Sarsa Learning , 2004 .

[13]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[14]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[15]  Pentti Kanerva,et al.  Sparse Distributed Memory , 1988 .

[16]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[17]  Jean-Paul Laumond,et al.  A Complexity result for the pursuit-evasion game of maintaining visibility of a moving evader , 2008, 2008 IEEE International Conference on Robotics and Automation.

[18]  Doina Precup,et al.  Sparse Distributed Memories for On-Line Value-Based Reinforcement Learning , 2004, ECML.

[19]  Joseph Mitola,et al.  Cognitive radio: making software radios more personal , 1999, IEEE Wirel. Commun..

[20]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[21]  Manuela M. Veloso,et al.  Towards collaborative and adversarial learning: a case study in robotic soccer , 1998, Int. J. Hum. Comput. Stud..

[22]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.

[23]  李幼升,et al.  Ph , 1989 .

[24]  Andrew W. Moore,et al.  Variable Resolution Discretization in Optimal Control , 2002, Machine Learning.

[25]  Richard S. Sutton,et al.  Online Learning with Random Representations , 1993, ICML.

[26]  P. Y. Glorennec,et al.  Fuzzy Q-learning and dynamical fuzzy Q-learning , 1994, Proceedings of 1994 IEEE 3rd International Fuzzy Systems Conference.

[27]  Ming Tan,et al.  Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents , 1997, ICML.

[28]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[29]  Pierre-Antoine Absil,et al.  Principal Manifolds for Data Visualization and Dimension Reduction , 2007 .

[30]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[31]  Daniel A. Ashlock,et al.  Evolutionary computation for modeling and optimization , 2005 .

[32]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[33]  Hamid R. Berenji,et al.  On convergence of fuzzy reinforcement learning , 2001, 10th IEEE International Conference on Fuzzy Systems. (Cat. No.01CH37297).

[34]  Huosheng Hu,et al.  KaBaGe-RL: Kanerva-based generalisation and reinforcement learning for possession football , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[35]  Sandip Sen,et al.  The Evolution of Multiagent Coordination Strategies , 1997 .

[36]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[37]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[38]  Ian F. Akyildiz,et al.  CRAHNs: Cognitive radio ad hoc networks , 2009, Ad Hoc Networks.

[39]  Andrew W. Moore,et al.  Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[40]  R. Bellman,et al.  FUNCTIONAL APPROXIMATIONS AND DYNAMIC PROGRAMMING , 1959 .

[41]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[42]  Cheng Wu,et al.  Fuzzy Kanerva-based function approximation for reinforcement learning , 2009, AAMAS.

[43]  Thomas W. Rondeau,et al.  Residential microwave oven interference on Bluetooth data performance , 2004, IEEE Transactions on Consumer Electronics.

[44]  Simon Haykin,et al.  Cognitive radio: brain-empowered wireless communications , 2005, IEEE Journal on Selected Areas in Communications.

[45]  Waleed Meleis,et al.  Function Approximation Using Tile and Kanerva Coding For Multi-Agent Systems , 2009 .

[46]  Sebastian Thrun,et al.  Visibility-based Pursuit-evasion with Limited Field of View , 2004, Int. J. Robotics Res..

[47]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[48]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[49]  Shie Mannor,et al.  Automatic basis function construction for approximate dynamic programming and reinforcement learning , 2006, ICML.

[50]  K. Fu,et al.  A heuristic approach to reinforcement learning control systems , 1965 .

[51]  M. Benda,et al.  On Optimal Cooperation of Knowledge Sources , 1985 .