XCS with eligibility traces

The development of the XCS Learning Classifier System has produced a robust and stable implementation that performs competitively in direct-reward environments. Although investigations in delayed-reward (i.e. multi-step) environments have shown promise, XCS still struggles to efficiently find optimal solutions in environments with long action-chains. This paper highlights the strong relation of XCS to reinforcement learning and identifies some of the major differences. This makes it possible to add Eligibility Traces to XCS, a method taken from reinforcement learning to update the prediction of the whole action-chain on each step, which should cause prediction update to be faster and more accurate. However, it is shown that the discrete nature of the condition representation of a classifier and the operation of the genetic algorithm cause traces to propagate back incorrect prediction values and in some cases results in a decrease of system performance. As a result further investigation of the existing approach to generalisation is proposed.

[1]  Stewart W. Wilson Mining Oblique Data with XCS , 2000, IWLCS.

[2]  Pier Luca Lanzi,et al.  An Analysis of Generalization in the XCS Classifier System , 1999, Evolutionary Computation.

[3]  Donald A. Waterman,et al.  Pattern-Directed Inference Systems , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Stewart W. Wilson,et al.  An Incremental Multiplexer Problem and Its Uses in Classifier System Research , 2001, IWLCS.

[5]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[6]  Martin V. Butz,et al.  Gradient descent methods in learning classifier systems: improving XCS performance in multistep problems , 2005, IEEE Transactions on Evolutionary Computation.

[7]  John H. Holmes,et al.  A Genetics-Based Machine Learning Approach to Knowledge Discovery in Clinical Data. , 1996 .

[8]  Gilles Venturini Apprentissage Adaptatif et Apprentisage Supervis? par Algorithme G?n?tique , 1994 .

[9]  Alwyn Barry,et al.  The stability of long action chains in XCS , 2002, Soft Comput..

[10]  Larry Bull,et al.  A Corporate Classifier System , 1998, PPSN.

[11]  J. van Leeuwen,et al.  Genetic and Evolutionary Computation — GECCO 2003 , 2003, Lecture Notes in Computer Science.

[12]  Alwyn Barry,et al.  Limits in Long Path Learning with XCS , 2003, GECCO.

[13]  Martin V. Butz,et al.  An algorithmic description of XCS , 2000, Soft Comput..

[14]  John H. Holland,et al.  Cognitive systems based on adaptive algorithms , 1977, SGAR.

[15]  John H. Holland,et al.  Properties of the Bucket Brigade , 1985, ICGA.

[16]  Paul Thagard,et al.  Induction: Processes Of Inference , 1989 .

[17]  Pier Luca Lanzi,et al.  A Study of the Generalization Capabilities of XCS , 1997, ICGA.

[18]  Cichosz,et al.  Faster Temporal Credit Assignmentin Learning , 2007 .

[19]  L. Baird Reinforcement Learning Through Gradient Descent , 1999 .

[20]  Stewart W. Wilson Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.

[21]  Rick L. Riolo,et al.  Bucket Brigade Performance: I. Long Sequences of Classifiers , 1987, ICGA.

[22]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[23]  Robert E. Smith,et al.  Classifier systems in combat: two-sided learning of maneuvers for advanced fighter aircraft , 2000 .

[24]  Andrew Stephen Tomlinson Corporate classifier systems , 1999 .

[25]  Marco Colombetti,et al.  Robot Shaping: An Experiment in Behavior Engineering , 1997 .

[26]  Daniele Montanari,et al.  Learning and bucket brigade dynamics in classifier systems , 1990 .

[27]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[28]  Stewart W. Wilson Generalization in the XCS Classifier System , 1998 .

[29]  Richard S. Sutton,et al.  Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.

[30]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[31]  er SystemsTim KovacsOctober Evolving Optimal Populations with XCS Classi , 1996 .

[32]  John H. Holland,et al.  Induction: Processes of Inference, Learning, and Discovery , 1987, IEEE Expert.

[33]  Martin V. Butz,et al.  How XCS evolves accurate classifiers , 2001 .

[34]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[35]  Pier Luca Lanzi,et al.  Learning classifier systems from a reinforcement learning perspective , 2002, Soft Comput..

[36]  Alexandre Parodi,et al.  The animat and the physician , 1991 .

[37]  Martin V. Butz,et al.  An Algorithmic Description of XCS , 2000, IWLCS.

[38]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[39]  Pawel Cichosz,et al.  Fast and Efficient Reinforcement Learning with Truncated Temporal Differences , 1995, ICML.

[40]  John H. Holland,et al.  COGNITIVE SYSTEMS BASED ON ADAPTIVE ALGORITHMS1 , 1978 .

[41]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .