Imitation guided learning in learning classifier systems

In this paper, we study the means of developing an imitation process allowing to improve learning in the framework of learning classifier systems. We present three different approaches in the way a behavior observed may be taken into account through a guidance interaction: two approaches using a model of this behavior, and one without modelling. Those approaches are evaluated and compared in different environments when they are applied to three major classifier systems: ZCS, XCS and ACS. Results are analyzed and discussed. They highlight the importance of using a model of the observed behavior to enable an efficient imitation. Moreover, they show the advantages of taking this model into account by a specialized internal action. Finally, they bring new results of comparison between ZCS, XCS and ACS.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  Alex M. Andrew,et al.  Reinforcement Learning: : An Introduction , 1998 .

[3]  Stewart W. Wilson ZCS: A Zeroth Level Classifier System , 1994, Evolutionary Computation.

[4]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[5]  Tim Kovacs,et al.  What Makes a Problem Hard for XCS? , 2000, IWLCS.

[6]  Stewart W. Wilson Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.

[7]  Marco Colombetti,et al.  Robot Shaping: Developing Autonomous Agents Through Learning , 1994, Artif. Intell..

[8]  Ivan Bratko,et al.  Skill Reconstruction as Induction of LQ Controllers with Subgoals , 1997, IJCAI.

[9]  R. W. Mitchell,et al.  A Comparative-Developmental Approach to Understanding Imitation , 1987 .

[10]  Longxin Lin Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.

[11]  Jean-Claude Heudin Virtual worlds : synthetic universes, digital life, and complexity , 1999 .

[12]  Peter Bakker,et al.  Robot see, robot do: An overview of robot imitation , 1996 .

[13]  Craig Boutilier,et al.  Accelerating reinforcement learning through imitation , 2003 .

[14]  Steve DiPaola,et al.  Avatars: Exploring and Building Virtual Worlds on the Internet , 1997 .

[15]  Maja J. Mataric,et al.  Using communication to reduce locality in distributed multiagent learning , 1997, J. Exp. Theor. Artif. Intell..

[16]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[17]  Craig Boutilier,et al.  Implicit Imitation in Multiagent Reinforcement Learning , 1999, ICML.

[18]  Steven D. Whitehead,et al.  A Complexity Analysis of Cooperative Mechanisms in Reinforcement Learning , 1991, AAAI.

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[21]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[22]  L. Douglas Kiel Virtual Worlds: Synthetic Universes, Digital Life and Complexity Edited by Jean-Claude Heudin , 2001, J. Artif. Soc. Soc. Simul..

[23]  Pier Luca Lanzi,et al.  An Analysis of Generalization in the XCS Classifier System , 1999, Evolutionary Computation.

[24]  C. Boutilier,et al.  Accelerating Reinforcement Learning through Implicit Imitation , 2003, J. Artif. Intell. Res..

[25]  Pier Luca Lanzi,et al.  Learning classifier systems from a reinforcement learning perspective , 2002, Soft Comput..

[26]  Martin V. Butz,et al.  An algorithmic description of XCS , 2000, Soft Comput..

[27]  A. Martin V. Butz,et al.  The anticipatory classifier system and genetic generalization , 2002, Natural Computing.

[28]  Ivan Bratko,et al.  Reconstructing Human Skill with Machine Learning , 1994, ECAI.

[29]  Paul E. Utgoff,et al.  Two Kinds of Training Information For Evaluation Function Learning , 1991, AAAI.

[30]  Larry Bull,et al.  ZCS Redux , 2002, Evolutionary Computation.

[31]  Dave Cliff,et al.  Adding Temporary Memory to ZCS , 1994, Adapt. Behav..

[32]  John H. Holland,et al.  Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems , 1995 .

[33]  S. King Learning to fly. , 1998, Nursing Times.