Adding Temporary Memory to ZCS

In a recent article, Wilson (1994) described a "zeroth-level" classifier system (ZCS). ZCS employs a reinforcement learning technique comparable to Q-learning (Watkins, 1989). This article presents results from the first reconstruction of ZCS. Having replicated Wilson's results, we extend ZCS in a manner suggested by Wilson: The original formulation of ZCS has no memory mechanisms, but Wilson (1994b) suggested how internal "temporary memory" registers could be added. We show results from adding one-bit and two-bit memory registers to ZCS. Our results demonstrate that ZCS can exploit memory facilities efficiently in non-Markov environments. We also show that the memoryless ZCS can converge on near-optimal stochastic solutions in non-Markov environments. We then present results from trials using ZCS in Markov environments that require increasingly long chains of actions before reward is received. Our results indicate that inaccurate overgeneral classifiers can interact with the classifier-generation mechanisms to cause catastrophic breakdowns in overall system performance. Basing classifier fitness on accuracy may alleviate this problem. We conclude that the memory mechanism in its current form is unlikely to scale well for situations requiring large amounts of temporary memory. Nevertheless, the ability to find stochastic solutions when there is insufficient memory might offset this problem somewhat.

[1]  Adaptation , 1926 .

[2]  John H. Holland,et al.  Cognitive systems based on adaptive algorithms , 1977, SGAR.

[3]  John H. Holland,et al.  COGNITIVE SYSTEMS BASED ON ADAPTIVE ALGORITHMS1 , 1978 .

[4]  Stewart W. Wilson Knowledge Growth in an Artificial Animal , 1985, ICGA.

[5]  John H. Holland,et al.  Induction: Processes of Inference, Learning, and Discovery , 1987, IEEE Expert.

[6]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[7]  David E. Goldberg,et al.  A Critical Review of Classifier Systems , 1989, ICGA.

[8]  Lashon B. Booker,et al.  Triggered Rule Discovery in Classifier Systems , 1989, ICGA.

[9]  C. Watkins Learning from delayed rewards , 1989 .

[10]  Daniele Montanari,et al.  Learning and bucket brigade dynamics in classifier systems , 1990 .

[11]  Dana H. Ballard,et al.  Learning to Perceive and Act , 1990 .

[12]  Stewart W. Wilson The animat path to AI , 1991 .

[13]  P. W. Frey,et al.  Letter recognition using Holland-style adaptive classifiers , 2004, Machine Learning.

[14]  Rodney A. Brooks,et al.  Artificial Life and Real Robots , 1992 .

[15]  Thomas S. Collett,et al.  Landmark learning and guidance in insects , 1992 .

[16]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[17]  Derek F. Yates,et al.  An Investigation into Possible Causes of and Solutions to Rule Strength Distortion Due to the Bucket Brigade Algorithm , 1993, ICGA.

[18]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[19]  Marco Dorigo,et al.  Genetic and Non-Genetic Operators in ALECSYS , 1993, Evolutionary Computation.

[20]  Dave Cliff,et al.  Adding "Foveal Vision" to Wilson's Animat , 1993, Adapt. Behav..

[21]  Michael L. Littman,et al.  Memoryless policies: theoretical limitations and practical results , 1994 .

[22]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.

[23]  Marco Colombetti,et al.  Training Agents to Perform Sequential Behavior , 1994, Adapt. Behav..

[24]  Stewart W. Wilson ZCS: A Zeroth Level Classifier System , 1994, Evolutionary Computation.

[25]  Marco Dorigo,et al.  A comparison of Q-learning and classifier systems , 1994 .

[26]  Stewart W. Wilson Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.

[27]  Long Ji Lin,et al.  Reinforcement Learning of Non-Markov Decision Processes , 1995, Artif. Intell..

[28]  Stanley J. Rosenschein,et al.  Reinforcement learning of non-Markov decision processes , 1996 .