Learning Strategies for Mid-Level Robot Control: Some Preliminary Considerations and Experiments

Versatile robots will need to be programmed, of course. But beyond explicit programming by a programmer, they will need to be able to plan how to perform new tasks and how to perform old tasks under new circumstances. They will also need to be able to learn. In this article, I concentrate on two types of learning, namely supervised learning and reinforcement learning of robot control programs. I argue also that it would be useful for all of these programs, those explicitly programmed, those planned, and those learned, to be expressed in a common language. I propose what I think is a good candidate for such a language, namely the formalism of teleo-reactive (T-R) programs. Most of the article deals with the matter of learning T-R programs. I assert that such programs are PAC learnable and then describe some techniques for learning them and the results of some preliminary learning experiments. The work on learning T-R programs is in a very early stage, but I think enough has been started to warrant further development and experimentation. For that reason I make this article available on the web, but I caution readers about the tentative nature of this work. I solicit comments and suggestions at:

[1]  Ivan Bratko,et al.  Reconstructing Human Skill with Machine Learning , 1994, ECAI.

[2]  Dimitri P. Bertsekas,et al.  Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.

[3]  Donald Michie,et al.  Cognitive models from subcognitive skills , 1990 .

[4]  Jonathan H. Connell,et al.  SSS: a hybrid architecture applied to robot navigation , 1992, Proceedings 1992 IEEE International Conference on Robotics and Automation.

[5]  Nils J. Nilsson,et al.  Teleo-Reactive Programs for Agent Control , 1993, J. Artif. Intell. Res..

[6]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[7]  Dean A. Pomerleau,et al.  Neural Network Perception for Mobile Robot Guidance , 1993 .

[8]  Scott Sherwood Benson,et al.  Learning action models for reactive autonomous agents , 1996 .

[9]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[10]  Nils J. Nilsson,et al.  Shakey the Robot , 1984 .

[11]  Nils J. Nilsson,et al.  Toward agent programs with circuit semantics , 1992 .

[12]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[13]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[14]  Andrew W. Moore,et al.  The Anchors Hierarchy: Using the Triangle Inequality to Survive High Dimensional Data , 2000, UAI.

[15]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[16]  Claude Sammut,et al.  Learning to Fly , 1992, ML.