论文信息 - XCS with eligibility traces - 字舞流文

XCS with eligibility traces

The development of the XCS Learning Classifier System has produced a robust and stable implementation that performs competitively in direct-reward environments. Although investigations in delayed-reward (i.e. multi-step) environments have shown promise, XCS still struggles to efficiently find optimal solutions in environments with long action-chains. This paper highlights the strong relation of XCS to reinforcement learning and identifies some of the major differences. This makes it possible to add Eligibility Traces to XCS, a method taken from reinforcement learning to update the prediction of the whole action-chain on each step, which should cause prediction update to be faster and more accurate. However, it is shown that the discrete nature of the condition representation of a classifier and the operation of the genetic algorithm cause traces to propagate back incorrect prediction values and in some cases results in a decrease of system performance. As a result further investigation of the existing approach to generalisation is proposed.

Jan Drugowitsch | Alwyn Barry | Jan Drugowitsch | Alwyn Barry

[1] Stewart W. Wilson. Mining Oblique Data with XCS , 2000, IWLCS.

[2] Pier Luca Lanzi,et al. An Analysis of Generalization in the XCS Classifier System , 1999, Evolutionary Computation.

[3] Donald A. Waterman,et al. Pattern-Directed Inference Systems , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] Stewart W. Wilson,et al. An Incremental Multiplexer Problem and Its Uses in Classifier System Research , 2001, IWLCS.

[5] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[6] Martin V. Butz,et al. Gradient descent methods in learning classifier systems: improving XCS performance in multistep problems , 2005, IEEE Transactions on Evolutionary Computation.

[7] John H. Holmes,et al. A Genetics-Based Machine Learning Approach to Knowledge Discovery in Clinical Data. , 1996 .

[8] Gilles Venturini. Apprentissage Adaptatif et Apprentisage Supervis? par Algorithme G?n?tique , 1994 .

[9] Alwyn Barry,et al. The stability of long action chains in XCS , 2002, Soft Comput..

[10] Larry Bull,et al. A Corporate Classifier System , 1998, PPSN.

[11] J. van Leeuwen,et al. Genetic and Evolutionary Computation — GECCO 2003 , 2003, Lecture Notes in Computer Science.

[12] Alwyn Barry,et al. Limits in Long Path Learning with XCS , 2003, GECCO.

[13] Martin V. Butz,et al. An algorithmic description of XCS , 2000, Soft Comput..

[14] John H. Holland,et al. Cognitive systems based on adaptive algorithms , 1977, SGAR.

[15] John H. Holland,et al. Properties of the Bucket Brigade , 1985, ICGA.

[16] Paul Thagard,et al. Induction: Processes Of Inference , 1989 .

[17] Pier Luca Lanzi,et al. A Study of the Generalization Capabilities of XCS , 1997, ICGA.

[18] Cichosz,et al. Faster Temporal Credit Assignmentin Learning , 2007 .

[19] L. Baird. Reinforcement Learning Through Gradient Descent , 1999 .

[20] Stewart W. Wilson. Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.

[21] Rick L. Riolo,et al. Bucket Brigade Performance: I. Long Sequences of Classifiers , 1987, ICGA.

[22] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[23] Robert E. Smith,et al. Classifier systems in combat: two-sided learning of maneuvers for advanced fighter aircraft , 2000 .

[24] Andrew Stephen Tomlinson. Corporate classifier systems , 1999 .

[25] Marco Colombetti,et al. Robot Shaping: An Experiment in Behavior Engineering , 1997 .

[26] Daniele Montanari,et al. Learning and bucket brigade dynamics in classifier systems , 1990 .

[27] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[28] Stewart W. Wilson. Generalization in the XCS Classifier System , 1998 .

[29] Richard S. Sutton,et al. Reinforcement learning with replacing eligibility traces , 2004, Machine Learning.

[30] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[31] er SystemsTim KovacsOctober. Evolving Optimal Populations with XCS Classi , 1996 .

[32] John H. Holland,et al. Induction: Processes of Inference, Learning, and Discovery , 1987, IEEE Expert.

[33] Martin V. Butz,et al. How XCS evolves accurate classifiers , 2001 .

[34] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[35] Pier Luca Lanzi,et al. Learning classifier systems from a reinforcement learning perspective , 2002, Soft Comput..

[36] Alexandre Parodi,et al. The animat and the physician , 1991 .

[37] Martin V. Butz,et al. An Algorithmic Description of XCS , 2000, IWLCS.

[38] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[39] Pawel Cichosz,et al. Fast and Efficient Reinforcement Learning with Truncated Temporal Differences , 1995, ICML.

[40] John H. Holland,et al. COGNITIVE SYSTEMS BASED ON ADAPTIVE ALGORITHMS1 , 1978 .

[41] Richard S. Sutton,et al. Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .