Learning Robot Control - Using Control Policies as Abstract Actions

Autonomous robot systems operating in an uncertain environment have to be able to cope with new situations and task requirements. Important properties of the control architecture of such systems are thus that it is reactive, allows for exible responses to novel situations, and that it adapts to longer lasting changes in the environment or the task requirements. In the extreme case, this learning has to occur without the direct innuence of an outside teacher, making the reinforcement learning paradigm an attractive option since it allows to learn sequences of behavior from simple reinforcement signals 1, 17]. However, while these techniques have been applied to simple robot systems and in simulationn2, 5, 7, 10, 11, 12, 6], the complexity of the primitive action and state spaces of most robots leads to a need for large amounts of experiences to learn a given task, thus rendering these methods impracticable for on-line learning on such systems. Furthermore, most such learning systems do not provide a means for introducing a priori knowledge, thus permitting the occurrence of catastrophic failures which is often not permissible in real world systems which have to learn new tasks in a single trial. To address these issues, the control architecture presented here uses more abstract actions which allow to deene the system as a Discrete Event Dynamic System (DEDS) on an abstract, discrete state space, within which a policy for the given task is learned. To illustrate this, the architecture has been applied to walking tasks on a four-legged walking robot. The use of abstract actions within the reinforcement learning frameworkk14] promises to make it possible to address more complex tasks and platforms. Much of this promise stems from the possibility to treat the resulting system as an event driven system rather than a clock driven one, reducing the set of points at which the learning agent has to consider a new action to the times when certain control or sensor events happen. While this allows for optimal decision points to be missed if the corresponding sensor signals lie outside the scope of the current set of control and sensor alternatives, it also leads to a focus of attention and can dramatically

[1]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[2]  C. Watkins Learning from delayed rewards , 1989 .

[3]  W. M. Wonham,et al.  The control of discrete event systems , 1989 .

[4]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[5]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[6]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[7]  Vijaykumar Gullapalli,et al.  Reinforcement learning and its application to control , 1992 .

[8]  C. Atkeson,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[9]  Roderic A. Grupen,et al.  The applications of harmonic functions to robotics , 1993, J. Field Robotics.

[10]  Andrew G. Barto,et al.  Improving Elevator Performance Using Reinforcement Learning , 1995, NIPS.

[11]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[12]  José del R. Millán,et al.  Rapid, safe, and incremental learning of navigation strategies , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[13]  R. A. Grupen,et al.  TITLE A Hybrid Discrete Event Dynamic Systems Approach to Robot Control , 1996 .

[14]  Doina Precup,et al.  Multi-time Models for Temporally Abstract Planning , 1997, NIPS.

[15]  J. A. Coelho,et al.  A Control Basis for Learning Multifingered Grasps , 1997 .

[16]  Maja J. Mataric,et al.  Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[17]  Roderic A. Grupen,et al.  A Control Structure For Learning Locomotion Gaits , 1998 .