Biasing Exploration in an Anticipatory Learning Classifier System

The chapter investigates how model and behavioral learning can be improved in an anticipatory learning classifier system by biasing exploration. First, the applied system ACS2 is explained. Next, an overview over the possibilities of applying exploration biases in an anticipatory learning classifier system and specifically ACS2 is provided. In ACS2, a recency bias termed action delay bias as well as an error bias termed knowledge array bias is implemented. The system is applied in a dynamic maze task and an hand-eye coordination task to validate the biases. The experiments exhibit that biased exploration enables ACS2 to evolve and adapt its internal environmental model faster. Also adaptive behavior is improved.

[1]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[2]  Olivier Sigaud,et al.  YACS: Combining Dynamic Programming with Generalization in Classifier Systems , 2000, IWLCS.

[3]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[4]  Pattie Maes,et al.  Explore/Exploit Strategies in Autonomy , 1996 .

[5]  D. Sofge THE ROLE OF EXPLORATION IN LEARNING CONTROL , 1992 .

[6]  Gilles Venturini,et al.  Adaptation in dynamic environments through a minimal probability of exploration , 1994 .

[7]  Martin V. Butz,et al.  First Cognitive Capabilities in the Anticipatory Classifier System , 2000 .

[8]  Larry Bull,et al.  A Corporate XCS , 1999, Learning Classifier Systems.

[9]  Stewart W. Wilson Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.

[10]  Martin V. Butz,et al.  Probability-Enhanced Predictions in the Anticipatory Classifier System , 2000, IWLCS.

[11]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[12]  Pier Luca Lanzi,et al.  An Analysis of Generalization in the XCS Classifier System , 1999, Evolutionary Computation.

[13]  Wolfgang Stolzmann,et al.  An Introduction to Anticipatory Classifier Systems , 1999, Learning Classifier Systems.

[14]  Stewart W. Wilson Generalization in the XCS Classifier System , 1998 .

[15]  P. Dayan,et al.  Exploration bonuses and dual control , 1996 .

[16]  Martin V. Butz,et al.  Latent Learning and Action Planning in Robots with Anticipatory Classifier Systems , 1999, Learning Classifier Systems.

[17]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[18]  Martin V. Butz,et al.  Introducing a Genetic Generalization Pressure to the Anticipatory Classifier System Part 2: Performa , 2000 .

[19]  Andrew W. Moore,et al.  Memory-based Reinforcement Learning: Converging with Less Data and Less Real Time , 1993 .