Module-Based Reinforcement Learning: Experiments with a Real Robot

The behavior of reinforcement learning (RL) algorithms is best understood in completely observable, discrete-time controlled Markov chains with finite state and action spaces. In contrast, robot-learning domains are inherently continuous both in time and space, and moreover are partially observable. Here we suggest a systematic approach to solve such problems in which the available qualitative and quantitative knowledge is used to reduce the complexity of learning task. The steps of the design process are to:i) decompose the task into subtasks using the qualitative knowledge at hand; ii) design local controllers to solve the subtasks using the available quantitative knowledge and iii) learn a coordination of these controllers by means of reinforcement learning. It is argued that the approach enables fast, semi-automatic, but still high quality robot-control as no fine-tuning of the local controllers is needed. The approach was verified on a non-trivial real-life robot task. Several RL algorithms were compared by ANOVA and it was found that the model-based approach worked significantly better than the model-free approach. The learnt switching strategy performed comparably to a handcrafted version. Moreover, the learnt strategy seemed to exploit certain properties of the environment which were not foreseen in advance, thus supporting the view that adaptive algorithms are advantageous to non-adaptive ones in complex environments.

[1]  G. Pólya,et al.  How to Solve It , 1945 .

[2]  R. Bellman Dynamic programming. , 1957, Science.

[3]  Allen Newell,et al.  Human Problem Solving. , 1973 .

[4]  Earl D. Sacerdott Planning in a hierarchy of abstraction spaces , 1973, IJCAI 1973.

[5]  J. Zabczyk Optimal control by means switchings , 1973 .

[6]  M. I. Henig Vector-Valued Dynamic Programming , 1983 .

[7]  R. Korf Learning to solve problems by searching for macro-operators , 1983 .

[8]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[9]  Richard E. Korf,et al.  Macro-Operators: A Weak Method for Learning , 1985, Artif. Intell..

[10]  Patchigolla Kiran Kumar,et al.  A Survey of Some Results in Stochastic Adaptive Control , 1985 .

[11]  中園 薫 A Qualitative Physics Based on Confluences , 1986 .

[12]  Richard E. Korf,et al.  Planning as Search: A Quantitative Approach , 1987, Artif. Intell..

[13]  Rodney A. Brooks,et al.  Learning to Coordinate Behaviors , 1990, AAAI.

[14]  Rodney A. Brooks,et al.  Elephants don't play chess , 1990, Robotics Auton. Syst..

[15]  Pattie Maes,et al.  A bottom-up mechanism for behavior selection in an artificial creature , 1991 .

[16]  John R. Koza,et al.  Automatic Programming of Robots Using Genetic Programming , 1992, AAAI.

[17]  Rodney A. Brooks,et al.  Artificial Life and Real Robots , 1992 .

[18]  Sebastian Thrun,et al.  The role of exploration in learning control , 1992 .

[19]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[20]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[21]  Satinder P. Singh,et al.  Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.

[22]  Sven Koenig,et al.  Complexity Analysis of Real-Time Reinforcement Learning , 1992, AAAI.

[23]  Roger W. Brockett,et al.  Hybrid Models for Motion Control Systems , 1993 .

[24]  András Lörincz,et al.  Behavior of an Adaptive Self-organizing Autonomous Agent Working with Cues and Competing Concepts , 1993, Adapt. Behav..

[25]  Toby Tyrrell,et al.  Computational mechanisms for action selection , 1993 .

[26]  Reid G. Simmons,et al.  Complexity Analysis of Real-Time Reinforcement Learning , 1993, AAAI.

[27]  Robert L. Grossman,et al.  Timed Automata , 1999, CAV.

[28]  Andrew McCallum,et al.  Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.

[29]  C. Szepesvari Dynamic concept model learns optimal policies , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[30]  V. Borkar,et al.  A unified framework for hybrid control : b background, model, and theory , 1994 .

[31]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[32]  Z. Kalmar,et al.  Generalization in an autonomous agent , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[33]  Marco Colombetti,et al.  Robot Shaping: Developing Autonomous Agents Through Learning , 1994, Artif. Intell..

[34]  John Lygeros,et al.  Hierarchical Hybrid Control: A Case Study , 1994, Hybrid Systems.

[35]  Michael I. Jordan,et al.  Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.

[36]  John N. Tsitsiklis,et al.  Asynchronous stochastic approximation and Q-learning , 1994, Mach. Learn..

[37]  Sebastian Thrun,et al.  Finding Structure in Reinforcement Learning , 1994, NIPS.

[38]  M. Dorigo ALECSYS and the AutonoMouse: Learning to Control a Real Robot by Distributed Classifier Systems , 1995, Machine Learning.

[39]  Richard S. Sutton,et al.  Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding , 1995, NIPS.

[40]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[41]  Michael S. Branicky,et al.  Studies in hybrid systems: modeling, analysis, and control , 1996 .

[42]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[43]  Csaba Szepesvári,et al.  A Generalized Reinforcement-Learning Model: Convergence and Applications , 1996, ICML.

[44]  Selahattin Kuru,et al.  Qualitative System Identification: Deriving Structure from Behavior , 1996, Artif. Intell..

[45]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[46]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[47]  Matthias Heger The Loss from Imperfect Value Functions in Expectation-Based and Minimax-Based Tasks , 1996, Machine Learning.

[48]  John N. Tsitsiklis,et al.  Analysis of Temporal-Diffference Learning with Function Approximation , 1996, NIPS.

[49]  Minoru Asada,et al.  Behavior coordination for a mobile robot using modular reinforcement learning , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[50]  Marco Colombetti,et al.  Behavior analysis and training-a methodology for behavior engineering , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[51]  Csaba Szepesv Ari,et al.  Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms , 1996 .

[52]  Rémi Munos,et al.  Finite-Element Methods with Local Triangulation Refinement for Continuous Reimforcement Learning Problems , 1997, ECML.

[53]  Csaba Szepesvári,et al.  Learning and Exploitation Do Not Conflict Under Minimax Optimality , 1997, ECML.

[54]  Satinder P. Singh,et al.  How to Dynamically Merge Markov Decision Processes , 1997, NIPS.

[55]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[56]  Jürgen Schmidhuber,et al.  HQ-Learning , 1997, Adapt. Behav..

[57]  John Lygeros,et al.  A Design Framework For Hierarchical, Hybrid Control , 1997 .

[58]  Ronen I. Brafman,et al.  Modeling Agents as Qualitative Decision Makers , 1997, Artif. Intell..

[59]  Csaba Szepesvari,et al.  Module Based Reinforcement Learning for a Real Robot , 1997 .

[60]  Maja J. Mataric,et al.  Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[61]  Csaba Szepesvári,et al.  Multi-criteria Reinforcement Learning , 1998, ICML.

[62]  Csaba Szepesvari Static and Dynamic Aspects of Optimal Sequential Decision Making , 1998 .

[63]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[64]  Minoru Asada,et al.  Purposive behavior acquisition for a real robot by vision-based reinforcement learning , 1995, Machine Learning.

[65]  M. Heger The loss from imperfect value functions in expectation-based and minimax-based tasks , 2004, Machine Learning.

[66]  András Lörincz,et al.  Genetic algorithm with alphabet optimization , 1995, Biological Cybernetics.

[67]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[68]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[69]  C. W. Tate Solve it. , 2005, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[70]  Rmi Munos Finite-Element Methods with Local Triangulation Refinement for Continuous Reinforcement Learning Problems , 2005 .

[71]  Minoru Asada,et al.  Purposive Behavior Acquisition for a Real Robot by Vision-Based Reinforcement Learning , 2005, Machine Learning.

[72]  Thomas G. Dietterich,et al.  Purposive Behavior Acqui- Sition for a Real Robot by Vision-based Reinforcement Learning. a Mulit- Agent Architecture Integrating Learning and Fuzzy Techniques for Landmark-based , .