Learning situation-dependent costs: Improving planning from probabilistic robot execution

Abstract Physical domains are notoriously hard to model completely and correctly, especially to capture the dynamics of the environment. In this article, we present Rogue , a robot that learns from its execution experiences. Since actions may have different costs under different conditions, we introduce the concept of situation-dependent rules, in which situational features are attached to the costs or probabilities, reflecting patterns and dynamics encountered in the environment. Rogue extracts learning opportunities from massive, continual, probabilistic execution traces. It then correlates these learning opportunities with environmental features, creating situation-dependent costs for its actions. We present the development and use of these rules for a robotic path planner. Our empirical results show that situation-dependent rules effectively improve the planner’s model of the environment, thus allowing the planner to predict and avoid failures, to create plans that are tailored to the real world, and to respond to a changing environment.

[1]  Allen Newell,et al.  SOAR: An Architecture for General Intelligence , 1987, Artif. Intell..

[2]  Karen Zita Haigh,et al.  Exploiting domain geometry in analogical route planning , 1997, J. Exp. Theor. Artif. Intell..

[3]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[4]  Karen Zita Haigh,et al.  Situation-dependent learning for interleaved planning and robot execution , 1998 .

[5]  Karen Zita Haigh,et al.  Interleaving Planning and Robot Execution for Asynchronous User Requests , 1998, Auton. Robots.

[6]  Reid G. Simmons,et al.  Passive Distance Learning for Robot Navigation , 1996, ICML.

[7]  R. R. Murphy,et al.  Learning the expected utility of sensors and algorithms , 1994, Proceedings of 1994 IEEE International Conference on MFI '94. Multisensor Fusion and Integration for Intelligent Systems.

[8]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[9]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[10]  Wei-Min Shen,et al.  Autonomous learning from the environment , 1994 .

[11]  Andrew W. Moore,et al.  Generalization in Reinforcement Learning: Safely Approximating the Value Function , 1994, NIPS.

[12]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[13]  Sven Koenig,et al.  Goal-Directed Acting with Incomplete Information , 1997 .

[14]  J. Deller Set membership identification in digital signal processing , 1989, IEEE ASSP Magazine.

[15]  W. Lovejoy A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[16]  Manuela M. Veloso,et al.  Planning and Learning by Analogical Reasoning , 1994, Lecture Notes in Computer Science.

[17]  E. Grant,et al.  Experiments in robot learning , 1989, Proceedings. IEEE International Symposium on Intelligent Control 1989.

[18]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[19]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[20]  Yolanda Gil,et al.  Acquiring domain knowledge for planning by experimentation , 1992 .

[21]  Douglas J. Pearson Learning Procedural Planning Knowledge in Complex Environments , 1996, AAAI/IAAI, Vol. 2.

[22]  Karen Zita Haigh,et al.  A layered architecture for office delivery robots , 1997, AGENTS '97.

[23]  Min Zhao,et al.  Mobile manipulator path planning by a genetic algorithm , 1994, J. Field Robotics.

[24]  Reid G. Simmons,et al.  Probabilistic Robot Navigation in Partially Observable Environments , 1995, IJCAI.

[25]  Kristian J. Hammond Learning and Reusing Explanations , 1987 .

[26]  David Kortenkamp,et al.  Topological Mapping for Mobile Robots Using a Combination of Sonar and Vision Sensing , 1994, AAAI.

[27]  Todd Michael Mansell,et al.  A method for Planning Given Uncertain and Incomplete Information , 1993, UAI.

[28]  Reid G. Simmons,et al.  Structured control for autonomous robots , 1994, IEEE Trans. Robotics Autom..

[29]  Ming Tan,et al.  Cost-sensitive robot learning , 1991 .

[30]  Maria Alicia Perez Learning search control knowledge to improve plan quality , 1996 .

[31]  Dean A. Pomerleau,et al.  Neural Network Perception for Mobile Robot Guidance , 1993 .

[32]  Jim Blythe,et al.  Planning with External Events , 1994, UAI.

[33]  Alberto Maria Segre Learning how to plan , 1991, Robotics Auton. Syst..