Experience-based control and coordination of autonomous mobile systems in dynamic environments

Many real-time machine control skills are too complex and laborious to be coded by hand. Preferably, such skills are acquired by learning algorithms. Suitable algorithms should learn automatically and based on experience from interaction with the machine's environment. But unfortunately, typical learning methods for real world machine control tasks have a number of problems: Huge high-dimensional state spaces complicate inductive learning, and it might be difficult to get a sufficient amount of appropriate training data for learning either because it takes too long or because it is extremely difficult to obtain good examples for learning from exploration. Furthermore, most current learning algorithms rely on a discrete MDP-model of the continuous state space, suffer from the incremental summation of errors during learning, and neglect the existence of undesirable states. The idea behind our approach of experience-based control is to exploit trajectories of successful explorations to approximate a value-function for the state space. To overcome the lack of training data we employ a realistic neural simulation of the machine's dynamics and introduce adequate exploration techniques, such as backward exploration, to acquire learning data. The combination of different exploration techniques allows for the integration of various types of initial knowledge and undesirable states can be integrated in the learning model. Since the majority of machine control tasks in technical applications shows deterministic behavior - or at least a unimodal probability distribution with a small variance - it is possible to use a simple projection-function instead of a complex MDP-model that was originally designed for discrete states. Our algorithms operate directly in a continuous state space and perform a number of explorations before we exploit the data. This is the main reason why our approach is robust against the incremental summation of noise which is often encountered in conventional learning algorithms. For the practical and efficient approximation of continuous functions we employ neural networks and networks of radial basis functions. Our methods have successfully been applied to numerous navigation tasks and tasks of situation dependent algorithm-selection.

[1]  Ian Horswill,et al.  An efficient coordination architecture for autonomous robot teams , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[2]  Wolfram Burgard,et al.  The dynamic window approach to collision avoidance , 1997, IEEE Robotics Autom. Mag..

[3]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[4]  Rachid Alami,et al.  Plan-Based Multi-robot Cooperation , 2001, Advances in Plan-Based Control of Robotic Agents.

[5]  Oliver Brock,et al.  High-speed navigation using the global dynamic window approach , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[6]  Daniele Nardi,et al.  Coordination among heterogeneous robotic soccer players , 2000, Proceedings. 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000) (Cat. No.00CH37113).

[7]  Peter Geibel,et al.  Reinforcement Learning with Bounded Risk , 2001, ICML.

[8]  Wolfram Burgard,et al.  Probabilistic Algorithms and the Interactive Museum Tour-Guide Robot Minerva , 2000, Int. J. Robotics Res..

[9]  Markus Jäger Cooperating Cleaning Robots , 2002, DARS.

[10]  Michael Beetz,et al.  Cooperative probabilistic state estimation for vision-based autonomous mobile robots , 2002, IEEE Trans. Robotics Autom..

[11]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[12]  Kurt Konolige,et al.  A gradient method for realtime robot control , 2000, Proceedings. 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000) (Cat. No.00CH37113).

[13]  roger ulrich Computer Simulation of Learning Experiments with Autonomous Mobile Robots , 1999 .

[14]  Peter Stone,et al.  Layered learning in multiagent systems - a winning approach to robotic soccer , 2000, Intelligent robotics and autonomous agents.

[15]  Warren Smith,et al.  Using Run-Time Predictions to Estimate Queue Wait Times and Improve Scheduler Performance , 1999, JSSPP.

[16]  Gene F. Franklin,et al.  Feedback Control of Dynamic Systems , 1986 .

[17]  Ronald C. Arkin,et al.  An Behavior-based Robotics , 1998 .

[18]  Wolfram Burgard,et al.  Coordination for Multi-Robot Exploration and Mapping , 2000, AAAI/IAAI.

[19]  Maja J. Mataric,et al.  New Directions: Robotics: Coordination and Learning in Multirobot Systems , 1998, IEEE Intell. Syst..

[20]  J. Y. S. Luh,et al.  Coordination and control of a group of small mobile robots , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[21]  Maja J. Mataric,et al.  Reinforcement Learning in the Multi-Robot Domain , 1997, Auton. Robots.

[22]  Michael Beetz,et al.  Learning to Execute Navigation Plans , 2001, KI/ÖGAI.

[23]  Hong Zhang,et al.  Collective Robotics: From Social Insects to Robots , 1993, Adapt. Behav..

[24]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[25]  Wolfram Burgard,et al.  Collaborative Multi-Robot Localization , 1999, DAGM-Symposium.

[26]  Rodney A. Brooks,et al.  Artificial Life and Real Robots , 1992 .

[27]  Michael Klupsch,et al.  Object-Oriented Representation of Time-Varying Data Sequences in Multiagent Systems , 1998 .

[28]  Wolfram Burgard,et al.  Map learning and high-speed navigation in RHINO , 1998 .

[29]  Michael R. M. Jenkin,et al.  Computational principles of mobile robotics , 2000 .

[30]  Jean-Claude Latombe,et al.  Robot motion planning , 1970, The Kluwer international series in engineering and computer science.

[31]  Thorsten Schmitt,et al.  Vision-based localization and data fusion in a system of cooperating mobile robots , 2000, Proceedings. 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000) (Cat. No.00CH37113).

[32]  Andrew W. Moore,et al.  Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time , 1993, Machine Learning.

[33]  Hiroaki Kitano,et al.  The RoboCup Synthetic Agent Challenge 97 , 1997, IJCAI.

[34]  Michael Beetz,et al.  Multi-robot path planning for dynamic environments: a case study , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[35]  Tsukasa Ogasawara,et al.  Continuous valued Q-learning method able to incrementally refine state space , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[36]  Dean A. Pomerleau,et al.  Neural Network Perception for Mobile Robot Guidance , 1993 .

[37]  S. Zucker,et al.  Toward Efficient Trajectory Planning: The Path-Velocity Decomposition , 1986 .

[38]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[40]  Weixiong Zhang,et al.  Towards flexible teamwork in persistent teams , 1998, Proceedings International Conference on Multi Agent Systems (Cat. No.98EX160).

[41]  Alessandro Saffiotti,et al.  The Saphira architecture: a design for autonomy , 1997, J. Exp. Theor. Artif. Intell..

[42]  Michael Beetz,et al.  Approximating the value function for continuous space reinforcement learning in robot control , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[43]  Michael Beetz,et al.  M-ROSE: A Multi Robot Simulation Environment for Learning Cooperative Behavior , 2002, DARS.

[44]  Joscha Bach,et al.  Mental Models for Robot Control , 2001, Advances in Plan-Based Control of Robotic Agents.

[45]  Maja J. Mataric,et al.  Using communication to reduce locality in distributed multiagent learning , 1997, J. Exp. Theor. Artif. Intell..

[46]  Hector J. Levesque,et al.  On Acting Together , 1990, AAAI.

[47]  Achim Schweikard,et al.  A simple path search strategy based on calculation of free sections of motions , 1992 .

[48]  Roderic A. Grupen,et al.  Learning to Coordinate Controllers - Reinforcement Learning on a Control Basis , 1997, IJCAI.

[49]  Gillian M. Hayes,et al.  Robot Shaping --- Principles, Methods and Architectures , 1996 .

[50]  Nicholas R. Jennings,et al.  Controlling Cooperative Problem Solving in Industrial Multi-Agent Systems Using Joint Intentions , 1995, Artif. Intell..

[51]  Michael R. M. Jenkin,et al.  A taxonomy for multi-agent robotics , 1996, Auton. Robots.

[52]  Maja J. Mataric,et al.  Multi-robot task allocation: analyzing the complexity and optimality of key architectures , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[53]  Wolfram Burgard,et al.  Collaborative Multi-Robot Localization , 1999, DAGM-Symposium.

[54]  James S. Albus,et al.  I A New Approach to Manipulator Control: The I Cerebellar Model Articulation Controller , 1975 .

[55]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory, Third Edition , 1989, Springer Series in Information Sciences.

[56]  Stuart J. Russell,et al.  Reinforcement learning for autonomous vehicles , 2002 .

[57]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[58]  Leslie Pack Kaelbling,et al.  Acting under uncertainty: discrete Bayesian models for mobile-robot navigation , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[59]  Richard S. Sutton,et al.  A Menu of Designs for Reinforcement Learning Over Time , 1995 .

[60]  Csaba Szepesvari,et al.  Module Based Reinforcement Learning for a Real Robot , 1997 .

[61]  Jean-Claude Latombe,et al.  Numerical potential field techniques for robot path planning , 1991, Fifth International Conference on Advanced Robotics 'Robots in Unstructured Environments.

[62]  Martin A. Riedmiller,et al.  Controlling an inverted pendulum by neural plant identification , 1993, Proceedings of IEEE Systems Man and Cybernetics Conference - SMC.

[63]  Tomaso A. Poggio,et al.  Extensions of a Theory of Networks for Approximation and Learning , 1990, NIPS.

[64]  Roland Siegwart,et al.  The interactive autonomous mobile system RoboX , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[65]  Martin A. Riedmiller,et al.  Using Machine Learning Techniques in Complex Multi-Agent Domains , 2003 .

[66]  Minoru Asada,et al.  Purposive behavior acquisition for a real robot by vision-based reinforcement learning , 1995, Machine Learning.

[67]  Oussama Khatib,et al.  Reactive collision avoidance for navigation with dynamic constraints , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[68]  Andrew B. Kahng,et al.  Cooperative Mobile Robotics: Antecedents and Directions , 1997, Auton. Robots.

[69]  M. J. D. Powell,et al.  Radial basis functions for multivariable interpolation: a review , 1987 .

[70]  Sandip Sen,et al.  Learning to Coordinate without Sharing Information , 1994, AAAI.

[71]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[72]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[73]  O. Jacobs,et al.  Introduction to Control Theory , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[74]  H. R. van Nauta Lemke,et al.  Application of a fuzzy controller in a warm water plant , 1976, Autom..

[75]  Alessandro Saffiotti,et al.  Multi-robot team coordination using desirabilities , 2000 .

[76]  Pierre Tournassoud A strategy for obstacle avoidance and its application to mullti-robot systems , 1986, Proceedings. 1986 IEEE International Conference on Robotics and Automation.

[77]  Ian Frank,et al.  Soccer Server: A Tool for Research on Multiagent Systems , 1998, Appl. Artif. Intell..

[78]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[79]  Ronald C. Arkin,et al.  Towards the Unification of Navigational Planning and Reactive Control , 1989 .

[80]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[81]  Michael Beetz Structured Reactive Controllers , 2004, Autonomous Agents and Multi-Agent Systems.

[82]  Bernhard Nebel,et al.  CS Freiburg: coordinating robots for successful soccer playing , 2002, IEEE Trans. Robotics Autom..

[83]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[84]  Minoru Asada,et al.  Vision-guided behavior acquisition of a mobile robot by multi-layered reinforcement learning , 2000, Proceedings. 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000) (Cat. No.00CH37113).

[85]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[86]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[87]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[88]  Michael Beetz,et al.  Planning and Executing Joint Navigation Tasks in Autonomous Robot Soccer , 2001, RoboCup.

[89]  Charles E. Thorpe,et al.  Panacea: An Active Sensor Controller for the ALVINN Autonomous Driving System , 1993 .

[90]  Michael Beetz,et al.  Plan-Based Control of Robotic Agents , 2002, Lecture Notes in Computer Science.

[91]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[92]  Tom Tollenaere,et al.  SuperSAB: Fast adaptive back propagation with good scaling properties , 1990, Neural Networks.

[93]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[94]  Minoru Asada,et al.  Continuous valued Q-learning for vision-guided behavior acquisition , 1999, Proceedings. 1999 IEEE/SICE/RSJ. International Conference on Multisensor Fusion and Integration for Intelligent Systems. MFI'99 (Cat. No.99TH8480).

[95]  Wolfram Burgard,et al.  A Probabilistic Method for Planning Collision-free Trajectories of Multiple Mobile Robots , 2000 .

[96]  Jing Peng,et al.  Efficient Learning and Planning Within the Dyna Framework , 1993, Adapt. Behav..

[97]  Michael Beetz,et al.  AGILO RoboCuppers 2001: Utility- and Plan-Based Action Selection Based on Probabilistically Estimated Game Situations , 2001, RoboCup.

[98]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[99]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[100]  Michael Beetz,et al.  Reliable Multi-robot Coordination Using Minimal Communication and Neural Prediction , 2001, Advances in Plan-Based Control of Robotic Agents.

[101]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[102]  A. TUSTIN,et al.  Automatic Control Systems , 1950, Nature.

[103]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[104]  M. Matari Coordination and Learning in Multi-Robot Systems , 1998 .

[105]  A. Sydow,et al.  Parallelity in high-level simulation architectures , 1998 .

[106]  Leslie Pack Kaelbling,et al.  Making Reinforcement Learning Work on Real Robots , 2002 .

[107]  Leslie Pack Kaelbling,et al.  Practical Reinforcement Learning in Continuous Spaces , 2000, ICML.

[108]  Roger J. Hubbold,et al.  Mobile Robot Simulation by Means of Acquired Neural Network Models , 1998, ESM.

[109]  Michael Beetz,et al.  Machine control using radial basis value functions and inverse state projection , 2002, 7th International Conference on Control, Automation, Robotics and Vision, 2002. ICARCV 2002..

[110]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[111]  Bruce Randall Donald,et al.  Real-time robot motion planning using rasterizing computer graphics hardware , 1990, SIGGRAPH.

[112]  James S. Albus,et al.  Brains, behavior, and robotics , 1981 .

[113]  Geoffrey E. Hinton Learning Translation Invariant Recognition in Massively Parallel Networks , 1987, PARLE.

[114]  Thorsten Schmitt,et al.  Agilo RoboCuppers: RoboCup Team Description , 1999, RoboCup.

[115]  Dimitri P. Bertsekas,et al.  Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[116]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[117]  Peter Stone,et al.  Scaling Reinforcement Learning toward RoboCup Soccer , 2001, ICML.

[118]  Andrew W. Moore,et al.  Barycentric Interpolators for Continuous Space and Time Reinforcement Learning , 1998, NIPS.

[119]  J. Nadal,et al.  Learning in feedforward layered networks: the tiling algorithm , 1989 .

[120]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[121]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[122]  Bernhard Nebel,et al.  Towards a Life-Long Learning Soccer Agent , 2002, RoboCup.

[123]  Martin A. Riedmiller,et al.  Learning Situation Dependent Success Rates of Actions in a RoboCup Scenario , 2000, PRICAI.

[124]  Martin A. Riedmiller,et al.  Karlsruhe Brainstormers - Design Principles , 1999, RoboCup.

[125]  Donald Reid An algorithm for tracking multiple targets , 1978 .

[126]  Martin A. Riedmiller,et al.  Karlsruhe Brainstormers - A Reinforcement Learning Approach to Robotic Soccer , 2000, RoboCup.

[127]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[128]  Herbert A. Simon,et al.  WHY SHOULD MACHINES LEARN , 1983 .

[129]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[130]  Richard S. Sutton,et al.  Generalization in ReinforcementLearning : Successful Examples UsingSparse Coarse , 1996 .

[131]  Y. K. Hwang,et al.  Motion planning for multiple moving objects , 1995, Proceedings. IEEE International Symposium on Assembly and Task Planning.

[132]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[133]  Marco Colombetti,et al.  Robot Shaping: Developing Autonomous Agents Through Learning , 1994, Artif. Intell..

[134]  Andrew B. Kahng,et al.  Cooperative Mobile Robotics: Antecedents and Directions , 1995, Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots.

[135]  Narendra Ahuja,et al.  A potential field approach to path planning , 1992, IEEE Trans. Robotics Autom..

[136]  Scott E. Fahlman,et al.  An empirical study of learning speed in back-propagation networks , 1988 .

[137]  Paul J. Webros A menu of designs for reinforcement learning over time , 1990 .

[138]  Dirk Schulz,et al.  Local action planning for mobile robot collision avoidance , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[139]  Eduard Aved’yan,et al.  The Cerebellar Model Articulation Controller (CMAC) , 1995 .

[140]  Warren S. Sarle,et al.  Stopped Training and Other Remedies for Overfitting , 1995 .

[141]  Dr. Hans Hellendoorn,et al.  An Introduction to Fuzzy Control , 1996, Springer Berlin Heidelberg.

[142]  Manuela M. Veloso,et al.  Real-time randomized path planning for robot navigation , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[143]  Richard Alterman,et al.  Multiagent Learning through Collective Memory , 1996 .

[144]  C. Atkeson,et al.  Prioritized Sweeping : Reinforcement Learning withLess Data and Less Real , 1993 .

[145]  Milind Tambe,et al.  Multiagent teamwork: analyzing the optimality and complexity of key theories and models , 2002, AAMAS '02.

[146]  Toshiyuki Kondo,et al.  A reinforcement learning with adaptive state space recruitment strategy for real autonomous mobile robots , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[147]  Marcus Frean,et al.  The Upstart Algorithm: A Method for Constructing and Training Feedforward Neural Networks , 1990, Neural Computation.

[148]  Gordon Wyeth,et al.  Multi-robot coordination in the robot soccer environment , 1999 .

[149]  Inman Harvey,et al.  Noise and the Reality Gap: The Use of Simulation in Evolutionary Robotics , 1995, ECAL.

[150]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[151]  Michael Beetz,et al.  The AGILO autonomous robot soccer team: computational principles, experiences, and perspectives , 2002, AAMAS '02.

[152]  Thierry Siméon,et al.  Multiple Path Coordination for Mobile Robots: A Geometric Algorithm , 1999, IJCAI.

[153]  Ronald C. Arkin,et al.  Robot behavioral selection using q-learning , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[154]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[155]  Sebastian Thrun,et al.  Issues in Using Function Approximation for Reinforcement Learning , 1999 .

[156]  Oussama Khatib,et al.  Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1985, Autonomous Robot Vehicles.