Anticipatory Learning Classifier Systems and Factored Reinforcement Learning

Factored Reinforcement Learning ( frl ) is a new technique to solve Factored Markov Decision Problems ( fmdp s) when the structure of the problem is not known in advance. Like Anticipatory Learning Classifier Systems ( alcs s), it is a model-based Reinforcement Learning approach that includes generalization mechanisms in the presence of a structured domain. In general, frl and alcs s are explicit, state-anticipatory approaches that learn generalized state transition models to improve system behavior based on model-based reinforcement learning techniques. In this contribution, we highlight the conceptual similarities and differences between frl and alcs s, focusing on the one hand on spiti , an instance of frl method, and on alcs s, macs and xacs , on the other hand. Though frl systems seem to benefit from a clearer theoretical grounding, an empirical comparison between spiti and xacs on two benchmark problems reveals that the latter scales much better than the former when some combination of state variables do not occur. Based on this finding, we discuss the mechanisms in xacs that result in the better scalability and propose importing these mechanisms into frl systems.

[1]  Paul E. Utgoff,et al.  Incremental Induction of Decision Trees , 1989, Machine Learning.

[2]  Martin V. Butz,et al.  Generalized State Values in an Anticipatory Learning Classifier System , 2003, ABiALS.

[3]  Olivier Sigaud,et al.  YACS: a new learning classifier system using anticipation , 2002, Soft Comput..

[4]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[5]  Jean-Arcady Meyer,et al.  Lookahead Planning and Latent Learning in a Classifier System , 1991 .

[6]  Martin V. Butz,et al.  Rule-Based Evolutionary Online Learning Systems - A Principled Approach to LCS Analysis and Design , 2006, Studies in Fuzziness and Soft Computing.

[7]  Olivier Sigaud,et al.  Chi-square Tests Driven Method for Learning the Structure of Factored MDPs , 2006, UAI.

[8]  Zbigniew Michalewicz,et al.  Evolutionary Computation 2 , 2000 .

[9]  Stewart W. Wilson ZCS: A Zeroth Level Classifier System , 1994, Evolutionary Computation.

[10]  Martin V. Butz,et al.  Function Approximation With XCS: Hyperellipsoidal Conditions, Recursive Least Squares, and Compaction , 2008, IEEE Transactions on Evolutionary Computation.

[11]  Julian F. Miller,et al.  Genetic and Evolutionary Computation — GECCO 2003 , 2003, Lecture Notes in Computer Science.

[12]  Olivier Sigaud,et al.  Designing Efficient Exploration with MACS: Modules and Function Approximation , 2003, GECCO.

[13]  Olivier Sigaud,et al.  Learning the structure of Factored Markov Decision Processes in reinforcement learning problems , 2006, ICML.

[14]  Martin V. Butz,et al.  Anticipatory Learning Classifier Systems , 2002, Genetic Algorithms and Evolutionary Computation.

[15]  Martin V. Butz,et al.  Toward a theory of generalization and learning in XCS , 2004, IEEE Transactions on Evolutionary Computation.

[16]  Craig Boutilier,et al.  Exploiting Structure in Policy Construction , 1995, IJCAI.

[17]  Duncan Potts,et al.  Incremental learning of linear model trees , 2004, ICML.

[18]  Olivier Sigaud,et al.  Combining latent learning with dynamic programming in the modular anticipatory classifier system , 2005, Eur. J. Oper. Res..

[19]  John H. Holland,et al.  COGNITIVE SYSTEMS BASED ON ADAPTIVE ALGORITHMS1 , 1978 .

[20]  Martin V. Butz,et al.  Anticipatory Behavior: Exploiting Knowledge About the Future to Improve Current Behavior , 2003, ABiALS.

[21]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[22]  Jesse Hoey,et al.  SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.

[23]  John H. Holland,et al.  Cognitive systems based on adaptive algorithms , 1977, SGAR.

[24]  Martin V. Butz,et al.  Introducing a Genetic Generalization Pressure to the Anticipatory Classifier System - Part 1: Theoretical approach , 2000, GECCO.

[25]  Stewart W. Wilson,et al.  Noname manuscript No. (will be inserted by the editor) Learning Classifier Systems: A Survey , 2022 .

[26]  Jean-Arcady Meyer,et al.  From Animals to Animats: Proceedings of The First International Conference on Simulation of Adaptive Behavior (Complex Adaptive Systems) , 1990 .

[27]  Stewart W. Wilson Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.

[28]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[29]  Tim Kovacs,et al.  Advances in Learning Classifier Systems , 2001, Lecture Notes in Computer Science.

[30]  Martin V. Butz,et al.  An algorithmic description of XCS , 2000, Soft Comput..

[31]  A. Martin V. Butz,et al.  The anticipatory classifier system and genetic generalization , 2002, Natural Computing.

[32]  Richard S. Sutton,et al.  Planning by Incremental Dynamic Programming , 1991, ML.

[33]  Wolfgang Stolzmann,et al.  Anticipatory Classifier Systems: An introduction , 2001 .

[34]  Martin V. Butz,et al.  Anticipatory Behavior in Adaptive Learning Systems , 2003, Lecture Notes in Computer Science.

[35]  Christopher G. Atkeson,et al.  Constructive Incremental Learning from Only Local Information , 1998, Neural Computation.

[36]  Martin V. Butz,et al.  An Algorithmic Description of ACS2 , 2001, International Workshop on Learning Classifier Systems.