Plan Projection, Execution, and Learning for Mobile Robot Control

Most state-of-the-art hybrid control systems for mobile robots are decomposed into different layers. While the deliberation layer reasons about the actions required for the robot in order to achieve a given goal, the behavioral layer is designed to enable the robot to quickly react to unforeseen events. This decomposition guarantees a safe operation even in the presence of unforeseen and dynamic obstacles and enables the robot to cope with situations it was not explicitly programmed for. The layered design, however, also leaves us with the problem of plan execution. The problem of plan execution is the problem of arbitrating between the deliberationand the behavioral layer. Abstract symbolic actions have to be translated into streams of local control commands. Simultaneously, execution failures have to be handled on an appropriate level of abstraction. It is now widely accepted that plan execution should form a third layer of a hybrid robot control system. The resulting layered architectures are called three-tiered architectures, or 3T architectures for short. Although many high level programming frameworks have been proposed to support the implementation of the intermediate layer, there is no generally accepted algorithmic basis for plan execution in three-tiered architectures. In this thesis, we propose to base plan execution on plan projection and learning and present a general framework for the self-supervised improvement of plan execution. This framework has been implemented in Appeal, an Architecture for Plan Projection, Execution And Learning, which extends the well known Rhino control system by introducing an execution layer. This thesis contributes to the field of plan-based mobile robot control which investigates the interrelation between planning, reasoning, and learning techniques based on an explicit representation of the robot’s intended course of action, a plan. In McDermott’s terminology, a plan is that part of a robot control program, which the robot cannot only execute, but also reason about and manipulate. According to that broad view, a plan may serve many purposes in a robot control system like reasoning about future behavior, the revision of intended activities, or learning. In this thesis, plan-based control is applied to the self-supervised improvement of mobile robot plan execution.

[1]  D. McDermott Transformational Planning of Reactive Behavior , 1992 .

[2]  Wolfram Burgard,et al.  The dynamic window approach to collision avoidance , 1997, IEEE Robotics Autom. Mag..

[3]  Oliver Brock,et al.  High-speed navigation using the global dynamic window approach , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[4]  Tom M. Mitchell,et al.  Becoming Increasingly Reactive , 1990, AAAI.

[5]  Gerald Jay Sussman,et al.  A Computer Model of Skill Acquisition , 1975 .

[6]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[7]  Jonathan H. Connell,et al.  SSS: a hybrid architecture applied to robot navigation , 1992, Proceedings 1992 IEEE International Conference on Robotics and Automation.

[8]  Erann Gat,et al.  ESL: a language for supporting robust plan execution in embedded autonomous agents , 1997, 1997 IEEE Aerospace Conference.

[9]  Wolfram Burgard,et al.  Monte Carlo Localization: Efficient Position Estimation for Mobile Robots , 1999, AAAI/IAAI.

[10]  Dieter Fox,et al.  Markov localization - a probabilistic framework for mobile robot localization and navigation , 1998 .

[11]  Robin R. Murphy,et al.  Introduction to AI Robotics , 2000 .

[12]  Rachid Alami,et al.  Multi-robot cooperation in the MARTHA project , 1998, IEEE Robotics Autom. Mag..

[13]  Andrew W. Moore,et al.  Prioritized sweeping: Reinforcement learning with less data and less time , 2004, Machine Learning.

[14]  E. Gat On Three-Layer Architectures , 1997 .

[15]  Michael Beetz,et al.  Experience- and Model-based Transformational Learning of Symbolic Behavior Specifications , 1999, IJCAI 1999.

[16]  Ryszard S. Michalski,et al.  A Theory and Methodology of Inductive Learning , 1983, Artificial Intelligence.

[17]  Michael Beetz Concurrent reactive plans: anticipating and forestalling execution failures , 2000 .

[18]  Yoram Koren,et al.  Potential field methods and their inherent limitations for mobile robot navigation , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[19]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[20]  Yoram Koren,et al.  The vector field histogram-fast obstacle avoidance for mobile robots , 1991, IEEE Trans. Robotics Autom..

[21]  Hector J. Levesque,et al.  GOLOG: A Logic Programming Language for Dynamic Domains , 1997, J. Log. Program..

[22]  Erann Gat,et al.  Integrating Planning and Reacting in a Heterogeneous Asynchronous Architecture for Controlling Real-World Mobile Robots , 1992, AAAI.

[23]  Tara Estlin,et al.  The CLARAty architecture for robotic autonomy , 2001, 2001 IEEE Aerospace Conference Proceedings (Cat. No.01TH8542).

[24]  Thomas G. Dietterich,et al.  Efficient Value Function Approximation Using Regression Trees , 1999 .

[25]  Alberto Elfes,et al.  Using occupancy grids for mobile robot perception and navigation , 1989, Computer.

[26]  John Mingers,et al.  Rule Induction with Statistical Data—A Comparison with Multiple Regression , 1987 .

[27]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[28]  J. Ross Quinlan,et al.  Combining Instance-Based and Model-Based Learning , 1993, ICML.

[29]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[30]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[31]  Drew McDermott,et al.  An Algorithm for Probabilistic, Totally-Ordered Temporal Projection , 1994 .

[32]  L.-J. Lin,et al.  Hierarchical learning of robot skills by reinforcement , 1993, IEEE International Conference on Neural Networks.

[33]  G. Swaminathan Robot Motion Planning , 2006 .

[34]  Sebastian Thrun,et al.  Coastal Navigation with Mobile Robots , 1999, NIPS.

[35]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[36]  Christian Schlegel Fast local obstacle avoidance under kinematic and dynamic constraints for a mobile robot , 1998, Proceedings. 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems. Innovations in Theory, Practice and Applications (Cat. No.98CH36190).

[37]  Michael Beetz,et al.  Learning to Execute Navigation Plans , 2001, KI/ÖGAI.

[38]  Amy L. Lansky,et al.  Reactive Reasoning and Planning , 1987, AAAI.

[39]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[40]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[41]  Wolfram Burgard,et al.  An integrated approach to goal-directed obstacle avoidance under dynamic constraints for dynamic environments , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[42]  Dirk Schulz Internet based robotic tele presence , 2002 .

[43]  Smadar T. Kedar-Cabelli,et al.  Explanation-Based Generalization as Resolution Theorem Proving , 1987 .

[44]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[45]  Leslie Pack Kaelbling Rex: A Symbolic Language for the Design and Parallel Implementation of Embedded Systems , 1987 .

[46]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[47]  Aram Karalic,et al.  Employing Linear Regression in Regression Tree Leaves , 1992, ECAI.

[48]  Johannes Fürnkranz,et al.  Incremental Reduced Error Pruning , 1994, ICML.

[49]  Hector Muñoz-Avila,et al.  SHOP: Simple Hierarchical Ordered Planner , 1999, IJCAI.

[50]  Robin R. Murphy,et al.  Artificial intelligence and mobile robots: case studies of successful robot systems , 1998 .

[51]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[52]  Brian R. Gaines,et al.  Induction of ripple-down rules applied to modeling large databases , 1995, Journal of Intelligent Information Systems.

[53]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[54]  Luís Torgo,et al.  Predicting the Density of Algae Communities using Local Regression Trees , 2001 .

[55]  Michael Beetz,et al.  Learning Structured Reactive Navigation Plans from Executing MDP policies , 2001 .

[56]  D. Fox,et al.  Integrated Plan-based Control of Autonomous Service Robots in Human Environments , 2001 .

[57]  Allen Newell,et al.  Chunking in Soar: The anatomy of a general learning mechanism , 1985, Machine Learning.

[58]  Wolfram Burgard,et al.  Position Estimation for Mobile Robots in Dynamic Environments , 1998, AAAI/IAAI.

[59]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[60]  Dana S. Nau,et al.  Success in Spades: Using AI Planning Techniques to Win the World Championship of Computer Bridge , 1998, AAAI/IAAI.

[61]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[62]  Stefan Kramer,et al.  Structural Regression Trees , 1996, AAAI/IAAI, Vol. 1.

[63]  Simon Kasif,et al.  OC1: A Randomized Induction of Oblique Decision Trees , 1993, AAAI.

[64]  T. M. Hertzberg,et al.  Plan Projection under the APPEAL Robot Control Architecture , 2003 .

[65]  Kurt Konolige,et al.  A gradient method for realtime robot control , 2000, Proceedings. 2000 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000) (Cat. No.00CH37113).

[66]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[67]  Marco Colombetti,et al.  Robot Shaping: Developing Autonomous Agents Through Learning , 1994, Artif. Intell..

[68]  Allen Newell,et al.  SOAR: An Architecture for General Intelligence , 1987, Artif. Intell..

[69]  Manu Sridharan,et al.  Multi-agent Q-learning and regression trees for automated pricing decisions , 2000, Proceedings Fourth International Conference on MultiAgent Systems.

[70]  Karen Zita Haigh,et al.  Situation-dependent learning for interleaved planning and robot execution , 1998 .

[71]  Illah R. Nourbakhsh,et al.  DERVISH - An Office-Navigating Robot , 1995, AI Mag..

[72]  Gerald Jay Sussman,et al.  The virtuous nature of bugs , 1974 .

[73]  Wolfram Burgard,et al.  The Interactive Museum Tour-Guide Robot , 1998, AAAI/IAAI.

[74]  Nils J. Nilsson,et al.  Shakey the Robot , 1984 .

[75]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[76]  Wolfram Burgard,et al.  Robust visualization of navigation experiments with mobile robots over the Internet , 1999, Proceedings 1999 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human and Environment Friendly Robots with High Intelligence and Emotional Quotients (Cat. No.99CH36289).

[77]  W. Burgard,et al.  Markov Localization for Mobile Robots in Dynamic Environments , 1999, J. Artif. Intell. Res..

[78]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[79]  Carla E. Brodley,et al.  Multivariate decision trees , 2004, Machine Learning.

[80]  Wolfram Burgard,et al.  Integrating global position estimation and position tracking for mobile robots: the dynamic Markov localization approach , 1998, Proceedings. 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems. Innovations in Theory, Practice and Applications (Cat. No.98CH36190).

[81]  Joachim Hertzberg,et al.  Advances in Plan-Based Control of Robotic Agents , 2003, Lecture Notes in Computer Science.

[82]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[83]  Hans P. Moravec Sensor Fusion in Certainty Grids for Mobile Robots , 1988, AI Mag..

[84]  Antonio Morales,et al.  Scheduling Tasks to a Team of Autonomous Mobile Service Robots in Indoor Enviroments , 2002, J. Univers. Comput. Sci..

[85]  Drew McDermott,et al.  Robot Planning , 1991, AI Mag..

[86]  Scott W. Bennett,et al.  Real-world robotics: Learning to plan for robust execution , 1996, Machine Learning.

[87]  Hector Muñoz-Avila,et al.  Case-based planning , 2005, The Knowledge Engineering Review.

[88]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[89]  Michael Beetz,et al.  XFRMLearn: A System for Learning Structured Reactive Navigation Plans , .

[90]  Hendrik Blockeel,et al.  Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[91]  Eugene Fink,et al.  Integrating planning and learning: the PRODIGY architecture , 1995, J. Exp. Theor. Artif. Intell..

[92]  Anne Condon,et al.  On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[93]  David E. Wilkins,et al.  Practical planning - extending the classical AI planning paradigm , 1989, Morgan Kaufmann series in representation and reasoning.

[94]  Ronald C. Arkin,et al.  An Behavior-based Robotics , 1998 .

[95]  Robert James Firby,et al.  Adaptive execution in complex dynamic worlds , 1989 .

[96]  J. Ross Quinlan,et al.  Learning logical definitions from relations , 1990, Machine Learning.

[97]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[98]  Jadzia Cendrowska,et al.  PRISM: An Algorithm for Inducing Modular Rules , 1987, Int. J. Man Mach. Stud..

[99]  Armin B. Cremers,et al.  Learning action models for the improved execution of navigation plans , 2002, Robotics Auton. Syst..

[100]  R. James Firby,et al.  An Investigation into Reactive Planning in Complex Domains , 1987, AAAI.

[101]  Reid G. Simmons,et al.  The curvature-velocity method for local obstacle avoidance , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[102]  Michael Beetz,et al.  Probabilistic, Prediction-Based Schedule Debugging for Autonomous Robot Office Couriers , 1999, KI.

[103]  Earl David Sacerdoti,et al.  A Structure for Plans and Behavior , 1977 .

[104]  Reid G. Simmons,et al.  Probabilistic Robot Navigation in Partially Observable Environments , 1995, IJCAI.

[105]  Sridhar Mahadevan,et al.  Automatic Programming of Behavior-Based Robots Using Reinforcement Learning , 1991, Artif. Intell..

[106]  S.J.J. Smith,et al.  Empirical Methods for Artificial Intelligence , 1995 .

[107]  Armin B. Cremers,et al.  Enabling Autonomous Robots to Perform Complex Tasks , 2000, Künstliche Intell..

[108]  Wolfram Burgard,et al.  Active Mobile Robot Localization , 1997, IJCAI.

[109]  Michael J. Swain,et al.  An Architecture for Vision and Action , 1995, IJCAI.

[110]  Leslie Pack Kaelbling,et al.  Acting under uncertainty: discrete Bayesian models for mobile-robot navigation , 1996, Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96.

[111]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[112]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[113]  Iwan Ulrich,et al.  VFH/sup */: local obstacle avoidance with look-ahead verification , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[114]  I. G. BONNER CLAPPISON Editor , 1960, The Electric Power Engineering Handbook - Five Volume Set.

[115]  Wolfram Burgard,et al.  Map learning and high-speed navigation in RHINO , 1998 .

[116]  Larry D. Pyeatt,et al.  Integrating POMDP and reinforcement learning for a two layer simulated robot architecture , 1999, AGENTS '99.

[117]  David P. Miller,et al.  Experiences with an architecture for intelligent, reactive agents , 1995, J. Exp. Theor. Artif. Intell..

[118]  Reid G. Simmons,et al.  A task description language for robot control , 1998, Proceedings. 1998 IEEE/RSJ International Conference on Intelligent Robots and Systems. Innovations in Theory, Practice and Applications (Cat. No.98CH36190).

[119]  Reid G. Simmons,et al.  Structured control for autonomous robots , 1994, IEEE Trans. Robotics Autom..

[120]  Reid G. Simmons,et al.  Unsupervised learning of probabilistic models for robot navigation , 1996, Proceedings of IEEE International Conference on Robotics and Automation.

[121]  K. J. Evans,et al.  Computer Intensive Methods for Testing Hypotheses: An Introduction , 1990 .

[122]  Wolfram Burgard,et al.  GOLEX - Bridging the Gap between Logic (GOLOG) and a Real Robot , 1998, KI.

[123]  Joachim Hertzberg,et al.  Learning to optimize mobile robot navigation based on HTN plans , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[124]  Michael Beetz Plan-Based Control of Robotic Agents: Improving the Capabilities of Autonomous Robots , 2003 .

[125]  Armin B. Cremers,et al.  Learning of plan execution policies for indoor navigation , 2002, AI Commun..

[126]  Luc De Raedt,et al.  Inductive Constraint Logic , 1995, ALT.

[127]  Wolfram Burgard,et al.  Web interfaces for mobile robots in public places , 2000, IEEE Robotics Autom. Mag..

[128]  Daniel M. Gaines,et al.  Era: learning planner knowledge in complex, continuous and noisy environments , 2002 .

[129]  Daniel S. Weld An Introduction to Least Commitment Planning , 1994, AI Mag..

[130]  Michael Beetz,et al.  Environment and Task Adaptation for Robotic Agents , 2000 .

[131]  R. Peter Bonasso,et al.  Integrating Reaction Plans and Layered Competences Through Synchronous Control , 1991, IJCAI.

[132]  Ronald J. Williams,et al.  Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions , 1993 .

[133]  Malik Ghallab,et al.  Learning How to Combine Sensory-Motor Modalities for a Robust Behavior , 2001, Advances in Plan-Based Control of Robotic Agents.

[134]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[135]  Maja J. Mataric,et al.  Reward Functions for Accelerated Learning , 1994, ICML.