Rapid, safe, and incremental learning of navigation strategies

In this paper we propose a reinforcement connectionist learning architecture that allows an autonomous robot to acquire efficient navigation strategies in a few trials. Besides rapid learning, the architecture has three further appealing features. First, the robot improves its performance incrementally as it interacts with an initially unknown environment, and it ends up learning to avoid collisions even in those situations in which its sensors cannot detect the obstacles. This is a definite advantage over nonlearning reactive robots. Second, since it learns from basic reflexes, the robot is operational from the very beginning and the learning process is safe. Third, the robot exhibits high tolerance to noisy sensory data and good generalization abilities. All these features make this learning robot's architecture very well suited to real-world applications. We report experimental results obtained with a real mobile robot in an indoor environment that demonstrate the appropriateness of our approach to real autonomous robot control.

[1]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[2]  Sridhar Mahadevan,et al.  Enhancing Transfer in Reinforcement Learning by Building Stochastic Models of Robot Actions , 1992, ML.

[3]  David W. Payton,et al.  Internalized plans: A representation for action resources , 1990, Robotics Auton. Syst..

[4]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[5]  Satinder Singh Transfer of Learning by Composing Solutions of Elemental Sequential Tasks , 1992, Mach. Learn..

[6]  David P. Miller,et al.  Global Symbolic Maps from Local Navigation , 1991, AAAI.

[7]  Maja J. Mataric,et al.  Integration of representation into goal-driven behavior-based robots , 1992, IEEE Trans. Robotics Autom..

[8]  Marco Colombetti,et al.  Robot Shaping: Developing Autonomous Agents Through Learning , 1994, Artif. Intell..

[9]  Jonathan H. Connell,et al.  SSS: a hybrid architecture applied to robot navigation , 1992, Proceedings 1992 IEEE International Conference on Robotics and Automation.

[10]  Ben J. A. Kröse,et al.  Adaptive State Space Quantisation For Reinforcement Learning Of collision-free navigation , 1992, IROS.

[11]  Ronald C. Arkin,et al.  Integrating behavioral, perceptual, and world knowledge in reactive navigation , 1990, Robotics Auton. Syst..

[12]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[13]  Roderic A. Grupen,et al.  Robust Reinforcement Learning in Motion Planning , 1993, NIPS.

[14]  José del R. Millán,et al.  Reinforcement learning of goal-directed obstacle-avoiding reaction strategies in an autonomous mobile robot , 1995, Robotics Auton. Syst..

[15]  Jean-Claude Latombe,et al.  Robot Motion Planning: A Distributed Representation Approach , 1991, Int. J. Robotics Res..

[16]  Long Ji Lin,et al.  Programming Robots Using Reinforcement Learning and Teaching , 1991, AAAI.

[17]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[18]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[19]  John E. W. Mayhew,et al.  Obstacle Avoidance through Reinforcement Learning , 1991, NIPS.

[20]  Marcel Schoppers,et al.  Universal Plans for Reactive Robots in Unpredictable Environments , 1987, IJCAI.

[21]  Yoram Koren,et al.  Real-time obstacle avoidance for fact mobile robots , 1989, IEEE Trans. Syst. Man Cybern..

[22]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[23]  Alexander Zelinsky,et al.  A mobile robot exploration algorithm , 1992, IEEE Trans. Robotics Autom..

[24]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[25]  Erann Gat,et al.  Integrating reaction and planning in a heterogeneous asynchronous architecture for mobile robot navigation , 1991, SGAR.

[26]  Leslie Pack Kaelbling,et al.  The Synthesis of Digital Machines With Provable Epistemic Properties , 1986, TARK.

[27]  Hans P. Moravec,et al.  High resolution maps from wide angle sonar , 1985, Proceedings. 1985 IEEE International Conference on Robotics and Automation.

[28]  Oussama Khatib,et al.  Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1986 .

[29]  Herbert L. Roitblat,et al.  Mechanism and process in animal behavior: models of animals, animals as models , 1994 .

[30]  Oussama Khatib,et al.  Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1985, Autonomous Robot Vehicles.