Learning classifier system with average reward reinforcement learning

In the family of Learning Classifier Systems, the classifier system XCS is most widely used and investigated. However, the standard XCS has difficulties solving large multi-step problems, where long action chains are needed to get delayed rewards. Up to the present, the reinforcement learning technique in XCS has been based on Q-learning, which optimizes the discounted total reward received by an agent but tends to limit the length of action chains. However, there are some undiscounted reinforcement learning methods available, such as R-learning and average reward reinforcement learning in general, which optimize the average reward per time step. In this paper, R-learning is used as the reinforcement learning employed by XCS, to replace Q-learning. The modification results in a classifier system that is rapid and able to solve large maze problems. In addition, it produces uniformly spaced payoff levels, which can support long action chains and thus effectively prevent the occurrence of overgeneralization.

[1]  Osamu Katai,et al.  Learning Classifier System with Convergence and Generalization , 2005 .

[2]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[3]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[4]  Daniele Loiacono,et al.  Standard and averaging reinforcement learning in XCS , 2006, GECCO '06.

[5]  Martin V. Butz,et al.  Gradient descent methods in learning classifier systems: improving XCS performance in multistep problems , 2005, IEEE Transactions on Evolutionary Computation.

[6]  Pier Luca Lanzi Adding memory to XCS , 1998, 1998 IEEE International Conference on Evolutionary Computation Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98TH8360).

[7]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[8]  Stewart W. Wilson Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.

[9]  M. Colombetti,et al.  An extension to the XCS classifier system for stochastic environments , 1999 .

[10]  Anthony J. Bagnall,et al.  On the Classification of Maze Problems , 2005 .

[11]  Alwyn Barry,et al.  Limits in Long Path Learning with XCS , 2003, GECCO.

[12]  Martin V. Butz,et al.  An algorithmic description of XCS , 2000, Soft Comput..

[13]  Pier Luca Lanzi,et al.  An Analysis of Generalization in the XCS Classifier System , 1999, Evolutionary Computation.

[14]  J. Holland A mathematical framework for studying learning in classifier systems , 1986 .

[15]  Alwyn Barry,et al.  The stability of long action chains in XCS , 2002, Soft Comput..

[16]  T. Kovacs XCS Classifier System Reliably Evolves Accurate, Complete, and Minimal Representations for Boolean Functions , 1998 .

[17]  Paloma Martínez,et al.  Reinforcement learning of pedagogical policies in adaptive and intelligent educational systems , 2009, Knowl. Based Syst..

[18]  Sridhar Mahadevan,et al.  Sensitive Discount Optimality: Unifying Discounted and Average Reward Reinforcement Learning , 1996, ICML.

[19]  Pier Luca Lanzi,et al.  A Study of the Generalization Capabilities of XCS , 1997, ICGA.

[20]  Pier Luca Lanzi,et al.  Learning classifier systems from a reinforcement learning perspective , 2002, Soft Comput..

[21]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[22]  Bernard Widrow,et al.  Adaptive switching circuits , 1988 .

[23]  John H. Holland,et al.  Cognitive systems based on adaptive algorithms , 1977, SGAR.

[24]  Satinder P. Singh,et al.  Reinforcement Learning Algorithms for Average-Payoff Markovian Decision Processes , 1994, AAAI.

[25]  Tim Kovacs,et al.  Towards a Theory of Strong Overgeneral Classifiers , 2000, FOGA.

[26]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[27]  Sridhar Mahadevan,et al.  Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.

[28]  Anton Schwartz,et al.  A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[29]  Prasad Tadepalli,et al.  Model-Based Average Reward Reinforcement Learning , 1998, Artif. Intell..

[30]  Yinghuan Shi,et al.  Xcsc: a Novel Approach to Clustering with Extended Classifier System , 2011, Int. J. Neural Syst..

[31]  Ali Hamzeh,et al.  Voting based learning classifier system for multi-label classification , 2011, GECCO '11.

[32]  Martin V. Butz,et al.  An Algorithmic Description of XCS , 2000, IWLCS.

[33]  Michael Kirley,et al.  Guided Rule Discovery in XCS for High-Dimensional Classification Problems , 2011, Australasian Conference on Artificial Intelligence.

[34]  Martin V. Butz,et al.  Empirical analysis of generalization and learning in XCS with gradient descent , 2007, GECCO '07.

[35]  Stewart W. Wilson,et al.  Toward Optimal Classifier System Performance in Non-Markov Environments , 2000, Evolutionary Computation.

[36]  Stewart W. Wilson ZCS: A Zeroth Level Classifier System , 1994, Evolutionary Computation.

[37]  Martin V. Butz,et al.  How XCS evolves accurate classifiers , 2001 .

[38]  Davy Janssens,et al.  Allocating time and location information to activity-travel patterns through reinforcement learning , 2007, Knowl. Based Syst..

[39]  David E. Goldberg,et al.  A Critical Review of Classifier Systems , 1989, ICGA.

[40]  H. JoséAntonioMartín,et al.  Dyna-H: A heuristic planning reinforcement learning algorithm applied to role-playing game strategy decision systems , 2011, Knowl. Based Syst..

[41]  Anthony J. Bagnall,et al.  A learning classifier system for mazes with aliasing clones , 2009, Natural Computing.

[42]  Kerstin Eder,et al.  Improving XCS performance on overlapping binary problems , 2011, 2011 IEEE Congress of Evolutionary Computation (CEC).