Tractable planning under uncertainty: exploiting structure

The problem of planning under uncertainty has received significant attention in the scientific community over the past few years. It is now well-recognized that considering uncertainty during planning and decision-making is imperative to the design of robust computer systems. This is particularly crucial in robotics, where the ability to interact effectively with real-world environments is a prerequisite for success. The Partially Observable Markov Decision Process (POMDP) provides a rich framework for planning under uncertainty. The POMDP model can optimize sequences of actions which are robust to sensor noise, missing information, occlusion, as well as imprecise actuators. While the model is sufficiently rich to address most robotic planning problems, exact solutions are generally intractable for all but the smallest problems. This thesis argues that large POMDP problems can be solved by exploiting natural structural constraints. In support of this, we propose two distinct but complementary algorithms which overcome tractability issues in POMDP planning. PBVI is a sample-based approach which approximates a value function solution by planning over a small number of salient information states. PolCA+ is a hierarchical approach which leverages structural properties of a problem to decompose it into a set of smaller, easy-to-solve, problems. These techniques improve the tractability of POMDP planning to the point where POMDP-based robot controllers are a reality. This is demonstrated through the successful deployment of a nursing assistant robot.

[1]  Thomas G. Dietterich,et al.  A POMDP Approximation Algorithm That Anticipates the Need to Observe , 2000, PRICAI.

[2]  Bart Selman,et al.  Planning as Satisfiability , 1992, ECAI.

[3]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[4]  Sebastian Thrun,et al.  Locating moving entities in indoor environments with teams of mobile robots , 2003, AAMAS '03.

[5]  Earl D. Sacerdoti,et al.  Planning in a Hierarchy of Abstraction Spaces , 1974, IJCAI.

[6]  Craig Boutilier,et al.  Value-Directed Compression of POMDPs , 2002, NIPS.

[7]  Amedeo Cesta,et al.  Recent Advances in AI Planning , 1997, Lecture Notes in Computer Science.

[8]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[9]  Nicholas Roy,et al.  Exponential Family PCA for Belief Compression in POMDPs , 2002, NIPS.

[10]  Leslie Pack Kaelbling,et al.  Learning Policies with External Memory , 1999, ICML.

[11]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[12]  Robert Givan,et al.  Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.

[13]  Andrew G. Barto,et al.  PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in Reinforcement Learning , 2002, ICML.

[14]  Sebastian Thrun,et al.  Monte Carlo POMDPs , 1999, NIPS.

[15]  David P. Miller,et al.  Experiences with an architecture for intelligent, reactive agents , 1995, J. Exp. Theor. Artif. Intell..

[16]  Blai Bonet,et al.  Planning as heuristic search , 2001, Artif. Intell..

[17]  Eric A. Hansen,et al.  An Improved Grid-Based Approximation Algorithm for POMDPs , 2001, IJCAI.

[18]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[19]  Brian Austin Tate Using goal structure to direct search in a problem solver , 1975 .

[20]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[21]  Illah R. Nourbakhsh,et al.  DERVISH - An Office-Navigating Robot , 1995, AI Mag..

[22]  Marc G. Slack,et al.  Integrating deliberative planning in a robot architecture , 1994 .

[23]  Craig Boutilier,et al.  Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.

[24]  Mosur Ravishankar,et al.  Efficient Algorithms for Speech Recognition. , 1996 .

[25]  Robert Givan,et al.  Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes , 1997, UAI.

[26]  Malcolm R. K. Ryan Using Abstract Models of Behaviours to Automatically Generate Reinforcement Learning Hierarchies , 2002, ICML.

[27]  N. Zhang,et al.  Algorithms for partially observable markov decision processes , 2001 .

[28]  R. Simmons,et al.  Probabilistic Navigation in Partially Observable Environments , 1995 .

[29]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[30]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[31]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[32]  Erann Gat,et al.  Integrating Planning and Reacting in a Heterogeneous Asynchronous Architecture for Controlling Real-World Mobile Robots , 1992, AAAI.

[33]  Piergiorgio Bertoli,et al.  Heuristic Search + Symbolic Model Checking = Efficient Conformant Planning , 2001, IJCAI.

[34]  Sebastian Thrun,et al.  Finding Structure in Reinforcement Learning , 1994, NIPS.

[35]  Craig Boutilier,et al.  A POMDP formulation of preference elicitation problems , 2002, AAAI/IAAI.

[36]  Satinder P. Singh,et al.  How to Dynamically Merge Markov Decision Processes , 1997, NIPS.

[37]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[38]  Andrew McCallum,et al.  Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.

[39]  Thomas Dean,et al.  Decomposition Techniques for Planning in Stochastic Domains , 1995, IJCAI.

[40]  Reid G. Simmons,et al.  Probabilistic Robot Navigation in Partially Observable Environments , 1995, IJCAI.

[41]  Geoffrey E. Hinton,et al.  Feudal Reinforcement Learning , 1992, NIPS.

[42]  David Andre,et al.  State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.

[43]  Ronen I. Brafman,et al.  Prioritized Goal Decomposition of Markov Decision Processes: Toward a Synthesis of Classical and Decision Theoretic Planning , 1997, IJCAI.

[44]  Craig Boutilier,et al.  Value-Directed Belief State Approximation for POMDPs , 2000, UAI.

[45]  Weihong Zhang,et al.  Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes , 2011, J. Artif. Intell. Res..

[46]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[47]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[48]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[49]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[50]  Anthony Barrett,et al.  Task-Decomposition via Plan Parsing , 1994, AAAI.

[51]  Kenneth M. Dawson-Howe,et al.  The application of robotics to a mobility aid for the elderly blind , 1998, Robotics Auton. Syst..

[52]  Eric A. Hansen,et al.  Solving POMDPs by Searching in Policy Space , 1998, UAI.

[53]  Joelle Pineau,et al.  Spoken Dialogue Management Using Probabilistic Reasoning , 2000, ACL.

[54]  Bernhard Hengst,et al.  Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[55]  William S. Lovejoy,et al.  Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..

[56]  Sridhar Mahadevan,et al.  Learning Hierarchical Partially Observable Markov Decision Process Models for Robot Navigation , 2001 .

[57]  Sam Steel,et al.  Integrating Planning, Execution and Monitoring , 1988, AAAI.

[58]  Hector J. Levesque,et al.  GOLOG: A Logic Programming Language for Dynamic Domains , 1997, J. Log. Program..

[59]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[60]  David E. Smith,et al.  Conformant Graphplan , 1998, AAAI/IAAI.

[61]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[62]  Maria Gini,et al.  Deferred Planning and Sensor Use , 1990 .

[63]  David Chapman,et al.  Planning for Conjunctive Goals , 1987, Artif. Intell..

[64]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[65]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[66]  Xavier Boyen,et al.  Tractable Inference for Complex Stochastic Processes , 1998, UAI.

[67]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[68]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[69]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[70]  Robert P. Goldman,et al.  Expressive Planning and Explicit Knowledge , 1996, AIPS.

[71]  Sebastian Thrun,et al.  Coastal Navigation with Mobile Robots , 1999, NIPS.

[72]  Joelle Pineau,et al.  Experiences with a mobile robotic guide for the elderly , 2002, AAAI/IAAI.

[73]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[74]  Martha E. Pollack,et al.  A Plan-Based Personalized Cognitive Orthotic , 2002, AIPS.

[75]  Sridhar Mahadevan,et al.  Hierarchical Memory-Based Reinforcement Learning , 2000, NIPS.

[76]  Yoram Singer,et al.  The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.

[77]  David A. McAllester,et al.  Systematic Nonlinear Planning , 1991, AAAI.

[78]  Geoffrey J. Gordon,et al.  Finding Approximate POMDP solutions Through Belief Compression , 2011, J. Artif. Intell. Res..

[79]  Gregg Collins,et al.  Planning for Contingencies: A Decision-based Approach , 1996, J. Artif. Intell. Res..

[80]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[81]  Sebastian Thrun,et al.  Motion planning through policy search , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[82]  Rodney A. Brooks,et al.  A Robust Layered Control Syste For A Mobile Robot , 2022 .

[83]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[84]  Yishay Mansour,et al.  Approximate Planning in Large POMDPs via Reusable Trajectories , 1999, NIPS.

[85]  N. Vlassis,et al.  A fast point-based algorithm for POMDPs , 2004 .

[86]  Joelle Pineau,et al.  Towards robotic assistants in nursing homes: Challenges and results , 2003, Robotics Auton. Syst..

[87]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[88]  Andrew G. Barto,et al.  Automatic Discovery of Subgoals in Reinforcement Learning using Diverse Density , 2001, ICML.

[89]  Edward J. Sondik,et al.  The optimal control of par-tially observable Markov processes , 1971 .

[90]  Andrew Y. Ng,et al.  Policy Search via Density Estimation , 1999, NIPS.

[91]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[92]  Avrim Blum,et al.  Fast Planning Through Planning Graph Analysis , 1995, IJCAI.

[93]  Kevin M. Lynch,et al.  Sensorless parts orienting with a one-joint manipulator , 1997, Proceedings of International Conference on Robotics and Automation.

[94]  Kin Man Poon,et al.  A fast heuristic algorithm for decision-theoretic planning , 2001 .

[95]  Satinder Singh Transfer of learning by composing solutions of elemental sequential tasks , 2004, Machine Learning.

[96]  M. Rosencrantz,et al.  Locating Moving Entities in Dynamic Indoor Environments with Teams of Mobile Robots , 2002 .

[97]  Milos Hauskrecht,et al.  Incremental Methods for Computing Bounds in Partially Observable Markov Decision Processes , 1997, AAAI/IAAI.

[98]  Daniel S. Weld,et al.  UCPOP: A Sound, Complete, Partial Order Planner for ADL , 1992, KR.

[99]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[100]  David H. D. Warren,et al.  Generating Conditional Plans and Programs , 1976, AISB.

[101]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[102]  Chelsea C. White,et al.  A survey of solution techniques for the partially observed Markov decision process , 1991, Ann. Oper. Res..

[103]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[104]  Blai Bonet,et al.  An epsilon-Optimal Grid-Based Algorithm for Partially Observable Markov Decision Processes , 2002, ICML.

[105]  Stuart J. Russell,et al.  Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.

[106]  Jim Blythe,et al.  Planning Under Uncertainty in Dynamic Domains , 1998 .

[107]  Leslie Pack Kaelbling,et al.  Hierarchical Learning in Stochastic Domains: Preliminary Results , 1993, ICML.

[108]  Kee-Eung Kim,et al.  Solving Very Large Weakly Coupled Markov Decision Processes , 1998, AAAI/IAAI.

[109]  Wolfram Burgard,et al.  Experiences with an Interactive Museum Tour-Guide Robot , 1999, Artif. Intell..

[110]  Andrew W. Moore,et al.  Very Fast EM-Based Mixture Model Clustering Using Multiresolution Kd-Trees , 1998, NIPS.

[111]  Joelle Pineau,et al.  Pearl: A Mobile Robotic Assistant for the Elderly , 2002 .

[112]  Robert P. Goldman,et al.  Conditional Linear Planning , 1994, AIPS.

[113]  Craig Boutilier,et al.  Bounded Finite State Controllers , 2003, NIPS.

[114]  Wenju Liu,et al.  Planning in Stochastic Domains: Problem Characteristics and Approximation , 1996 .

[115]  Gang Wang,et al.  Hierarchical Optimization of Policy-Coupled Semi-Markov Decision Processes , 1999, ICML.

[116]  W. Lovejoy A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[117]  Mark A. Peot,et al.  Conditional nonlinear planning , 1992 .

[118]  David Madigan,et al.  Probabilistic Temporal Reasoning , 2005, Handbook of Temporal Reasoning in Artificial Intelligence.

[119]  Ronen I. Brafman,et al.  A Heuristic Variable Grid Solution Method for POMDPs , 1997, AAAI/IAAI.

[120]  Andrew W. Moore,et al.  Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[121]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .

[122]  Sebastian Thrun,et al.  Learning low dimensional predictive representations , 2004, ICML.

[123]  Jürgen Schmidhuber,et al.  HQ-Learning , 1997, Adapt. Behav..

[124]  Andrew G. Barto,et al.  Automated State Abstraction for Options using the U-Tree Algorithm , 2000, NIPS.

[125]  D. Castañón Approximate dynamic programming for sensor management , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[126]  Daniel S. Weld,et al.  A Probablistic Model of Action for Least-Commitment Planning with Information Gathering , 1994, UAI.

[127]  Peter Norvig,et al.  Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[128]  R. E. Kalman,et al.  A New Approach to Linear Filtering and Prediction Problems , 2002 .

[129]  Jonathan H. Connell,et al.  SSS: a hybrid architecture applied to robot navigation , 1992, Proceedings 1992 IEEE International Conference on Robotics and Automation.

[130]  Wolfram Burgard,et al.  Monte Carlo Localization: Efficient Position Estimation for Mobile Robots , 1999, AAAI/IAAI.

[131]  Paul Taylor,et al.  Festival Speech Synthesis System , 1998 .

[132]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[133]  Ronald C. Arkin,et al.  An Behavior-based Robotics , 1998 .

[134]  A. Jazwinski Stochastic Processes and Filtering Theory , 1970 .

[135]  Peter Stone,et al.  Learning Predictive State Representations , 2003, ICML.

[136]  Michael J. Swain,et al.  Programming CHIP for the IJCAI-95 Robot Competition , 1996, AI Mag..

[137]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[138]  Nicholas Kushmerick,et al.  An Algorithm for Probabilistic Planning , 1995, Artif. Intell..