论文信息 - TCS Learning Classifier System Controller on a Real Robot - 字舞流文

TCS Learning Classifier System Controller on a Real Robot

To date there have been few implementation of Holland's Learning Classifier System (LCS) on real robots. The paper introduces a Temporal Classifier System (TCS), an LCS derived from Wilson's ZCS. Traditional LCS have the ability to generalise over the state action-space of a reinforcement learning problem using evolutionary techniques. In TCS this generalisation ability can also be used to determine the state divisions in the state space considered by the LCS. TCS also implements components from Semi-Mark-Decision Process (SMDP) theory to weight the influence of time on the reward functions of the LCS. A simple light-seeking task on a real robot platform using TCS is presented which demonstrates desirable adaptive characteristics for the use of LCS on real robots.

Larry Bull | Chris Melhuish | Jacob Hurst | L. Bull | C. Melhuish | Jacob Hurst

[1] Stewart W. Wilson,et al. From Animals to Animats 5. Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior , 1997 .

[2] Stewart W. Wilson. Function approximation with a classifier system , 2001 .

[3] Manuel Valenzuela-Rendón,et al. The Fuzzy Classifier System: A Classifier System for Continuously Varying Variables , 1991, ICGA.

[4] John H. Holland,et al. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[5] Jean-Arcady Meyer,et al. Hierarchical Map Building and Self-Positioning with MonaLysa , 1996, Adapt. Behav..

[6] Larry Bull,et al. Self-adaptive mutation in classifier system controllers , 2000 .

[7] Larry Bull,et al. A Genetic Programming-based Classifier System , 1999, GECCO.

[8] Lashon B. Booker,et al. Instinct as an inductive bias for learning behavioral sequences , 1991 .

[9] Dave Cliff,et al. Adding Temporary Memory to ZCS , 1994, Adapt. Behav..

[10] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[11] Larry Bull,et al. ZCS Redux , 2002, Evolutionary Computation.

[12] Zbigniew Michalewicz,et al. Evolutionary Computation 2 , 2000 .

[13] Pattie Maes,et al. Spatial exploration, map learning, and self-positioning with MonaLysa , 1996 .

[14] Stewart W. Wilson. Get Real! XCS with Continuous-Valued Inputs , 1999, Learning Classifier Systems.

[15] Ronald E. Parr,et al. Hierarchical control and learning for markov decision processes , 1998 .

[16] Stewart W. Wilson. ZCS: A Zeroth Level Classifier System , 1994, Evolutionary Computation.

[17] Alwyn Barry,et al. Specifying Action Persistence within XCS , 2000, GECCO.

[18] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[19] Minoru Asada,et al. Purposive behavior acquisition for a real robot by vision-based reinforcement learning , 1995, Machine Learning.

[20] Larry Bull,et al. A Self-Adaptive Classifier System , 2000, IWLCS.

[21] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[22] Ashwin Ram,et al. Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[23] Martin V. Butz,et al. Latent Learning and Action Planning in Robots with Anticipatory Classifier Systems , 1999, Learning Classifier Systems.

[24] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[25] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[26] Minoru Asada,et al. Vision Based State Space Construction for Learning Mobile Robots in Multi-agent Environments , 1997, EWLR.

[27] Jean-Arcady Meyer,et al. Learning reactive and planning rules in a motivationally autonomous animat , 1996, IEEE Trans. Syst. Man Cybern. Part B.