Decision-Theoretic Planning: Structural Assumptions and Computational Leverage

Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives adopted in these areas often differ in substantial ways, many planning problems of interest to researchers in these fields can be modeled as Markov decision processes (MDPs) and analyzed using the techniques of decision theory. This paper presents an overview and synthesis of MDP-related methods, showing how they provide a unifying framework for modeling many classes of planning problems studied in AI. It also describes structural properties of MDPs that, when exhibited by particular classes of problems, can be exploited in the construction of optimal or approximately optimal policies or plans. Planning problems commonly possess structure in the reward and value functions used to describe performance criteria, in the functions used to describe state transitions and observations, and in the relationships among features used to describe states, actions, rewards, and observations. Specialized representations, and algorithms employing these representations, can achieve computational leverage by exploiting these various forms of structure. Certain AI techniques-- in particular those based on the use of structured, intensional representations--can be viewed in this way. This paper surveys several types of representations for both classical and decision-theoretic planning problems, and planning algorithms that exploit these representations in a number of different ways to ease the computational burden of constructing policies or plans. It focuses primarily on abstraction, aggregation and decomposition techniques based on AI-style representations.

[1]  George B. Dantzig,et al.  Decomposition Principle for Linear Programs , 1960 .

[2]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[3]  Stephen J. Garland,et al.  Algorithm 97: Shortest path , 1962, Commun. ACM.

[4]  F. d'Epenoux,et al.  A Probabilistic Production and Inventory Problem , 1963 .

[5]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .

[6]  Rutherford Aris,et al.  Discrete Dynamic Programming , 1965, The Mathematical Gazette.

[7]  P. Graefe Linear stochastic systems , 1966 .

[8]  Bennett L. Fox,et al.  Scientific Applications: An algorithm for identifying the ergodic subchains and transient states of a stochastic matrix , 1967, Commun. ACM.

[9]  D. J. White,et al.  Decision Theory , 2018, Behavioral Finance for Private Banking.

[10]  E. J. Sondik,et al.  The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[11]  Richard Fikes,et al.  STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[12]  Richard Fikes,et al.  Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..

[13]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[14]  Earl D. Sacerdoti,et al.  Planning in a Hierarchy of Abstraction Spaces , 1974, IJCAI.

[15]  E. Polak Introduction to linear and nonlinear programming , 1973 .

[16]  H. Kushner,et al.  Decomposition of systems governed by Markov chains , 1974 .

[17]  Earl D. Sacerdoti,et al.  The Nonlinear Nature of Plans , 1975, IJCAI.

[18]  David H. D. Warren,et al.  Generating Conditional Plans and Programs , 1976, AISB.

[19]  John B. Kidd,et al.  Decisions with Multiple Objectives—Preferences and Value Tradeoffs , 1977 .

[20]  M. Puterman,et al.  Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .

[21]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[22]  R. L. Keeney,et al.  Decisions with Multiple Objectives: Preferences and Value Trade-Offs , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[23]  Introduction to dynamic systems: Theory, models and application , 1980, Proceedings of the IEEE.

[24]  F. Fairman Introduction to dynamic systems: Theory, models and applications , 1979, Proceedings of the IEEE.

[25]  John McCarthy,et al.  SOME PHILOSOPHICAL PROBLEMS FROM THE STANDPOINT OF ARTI CIAL INTELLIGENCE , 1987 .

[26]  Jan Telgen,et al.  Stochastic Dynamic Programming , 1982 .

[27]  Richard E. Korf,et al.  Macro-Operators: A Weak Method for Learning , 1985, Artif. Intell..

[28]  Paul J. Schweitzer,et al.  Iterative Aggregation-Disaggregation Procedures for Discounted Semi-Markov Reward Processes , 1985, Oper. Res..

[29]  Editors , 1986, Brain Research Bulletin.

[30]  Randal E. Bryant,et al.  Graph-Based Algorithms for Boolean Function Manipulation , 1986, IEEE Transactions on Computers.

[31]  Ross D. Shachter Evaluating Influence Diagrams , 1986, Oper. Res..

[32]  Some philosophical problems from the standpoint of ai , 1987 .

[33]  David Chapman,et al.  Planning for Conjunctive Goals , 1987, Artif. Intell..

[34]  J. Finger,et al.  Exploiting constraints in design synthesis , 1987 .

[35]  Marcel Schoppers,et al.  Universal Plans for Reactive Robots in Unpredictable Environments , 1987, IJCAI.

[36]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[37]  Chelsea C. White,et al.  Solution Procedures for Partially Observed Markov Decision Processes , 1989, Oper. Res..

[38]  Edwin P. D. Pednault,et al.  ADL: Exploring the Middle Ground Between STRIPS and the Situation Calculus , 1989, KR.

[39]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[40]  D. Bertsekas,et al.  Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .

[41]  Keiji Kanazawa,et al.  A model for reasoning about persistence and causation , 1989 .

[42]  Richard E. Korf,et al.  Real-Time Heuristic Search , 1990, Artif. Intell..

[43]  Ross D. Shachter,et al.  Dynamic programming and influence diagrams , 1990, IEEE Trans. Syst. Man Cybern..

[44]  Drew V. Mcdermott,et al.  Projecting plans for uncertain worlds , 1990 .

[45]  Franz Josef Radermacher,et al.  Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Judea Pearl) , 1990, SIAM Rev..

[46]  David A. McAllester,et al.  Systematic Nonlinear Planning , 1991, AAAI.

[47]  Keith W. Ross,et al.  Multichain Markov Decision Processes with a Sample Path Constraint: A Decomposition Approach , 1991, Math. Oper. Res..

[48]  Leslie Pack Kaelbling,et al.  Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[49]  Andrew B. Baker,et al.  Nonmonotonic Reasoning in the Framework of Situation Calculus , 1991, Artif. Intell..

[50]  David Heckerman,et al.  Advances in Probabilistic Reasoning , 1994, Conference on Uncertainty in Artificial Intelligence.

[51]  William S. Lovejoy,et al.  Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..

[52]  Qiang Yang,et al.  Characterizing Abstraction Hierarchies for Planning , 1991, AAAI.

[53]  W. Lovejoy A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[54]  Michael P. Wellman,et al.  Planning and Control , 1991 .

[55]  David Lee,et al.  Online minimization of transition systems (extended abstract) , 1992, STOC '92.

[56]  Sven Koenig,et al.  Optimal Probabilistic and Decision-Theoretic Planning using Markovian , 1992 .

[57]  Mark A. Peot,et al.  Conditional nonlinear planning , 1992 .

[58]  Daniel S. Weld,et al.  UCPOP: A Sound, Complete, Partial Order Planner for ADL , 1992, KR.

[59]  Oren Etzioni,et al.  An Approach to Planning with Incomplete Information , 1992, KR.

[60]  Uffe Kjærulff,et al.  A Computational Scheme for Reasoning in Dynamic Probabilistic Networks , 1992, UAI.

[61]  Solomon Eyal Shimony,et al.  The role of relevance in explanation I: Irrelevance as statistical independence , 1993, Int. J. Approx. Reason..

[62]  Leslie Pack Kaelbling,et al.  Planning With Deadlines in Stochastic Domains , 1993, AAAI.

[63]  Mark A. Peot,et al.  Postponing Threats in Partial-Order Planning , 1993, AAAI.

[64]  Craig A. Knoblock Generating abstraction hierarchies - an automated approach to reducing search in planning , 1993, The Kluwer international series in engineering and computer science.

[65]  Enrico Macii,et al.  Algebraic decision diagrams and their applications , 1993, Proceedings of 1993 International Conference on Computer Aided Design (ICCAD).

[66]  Anton Schwartz,et al.  A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[67]  Craig Boutilier,et al.  Using Abstractions for Decision-Theoretic Planning with Time Constraints , 1994, AAAI.

[68]  Stuart J. Russell,et al.  Control Strategies for a Stochastic Planner , 1994, AAAI.

[69]  Jaime G. Carbonell,et al.  Control Knowledge to Improve Plan Quality , 1994, AIPS.

[70]  Daniel S. Weld,et al.  A Probablistic Model of Action for Least-Commitment Planning with Information Gathering , 1994, UAI.

[71]  Steve Hanks,et al.  Optimal Planning with a Goal-directed Utility Model , 1994, AIPS.

[72]  Gerald Tesauro,et al.  TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[73]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[74]  Sridhar Mahadevan,et al.  To Discount or Not to Discount in Reinforcement Learning: A Case Study Comparing R Learning and Q Learning , 1994, ICML.

[75]  Peter Haddawy,et al.  Decision-theoretic Refinement Planning Using Inheritance Abstraction , 1994, AIPS.

[76]  Daniel S. Weld,et al.  Probabilistic Planning with Information Gathering and Contingent Execution , 1994, AIPS.

[77]  James A. Hendler,et al.  Readings in Planning , 1994 .

[78]  Robert P. Goldman,et al.  Representing Uncertainty in Simple Planners , 1994, KR.

[79]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[80]  Daniel S. Weld An Introduction to Least Commitment Planning , 1994, AI Mag..

[81]  Tom Bylander,et al.  The Computational Complexity of Propositional STRIPS Planning , 1994, Artif. Intell..

[82]  Peter Haddawy,et al.  Abstracting Probabilistic Actions , 1994, UAI.

[83]  Fangzhen Lin,et al.  State Constraints Revisited , 1994, J. Log. Comput..

[84]  Sebastian Thrun,et al.  Finding Structure in Reinforcement Learning , 1994, NIPS.

[85]  Craig Boutilier,et al.  Integrating Planning and Execution in Stochastic Domains , 1994, UAI.

[86]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[87]  Drew McDermott,et al.  Modeling a Dynamic and Uncertain World I: Symbolic and Probabilistic Reasoning About Change , 1994, Artif. Intell..

[88]  Leslie Pack Kaelbling,et al.  Planning under Time Constraints in Stochastic Domains , 1993, Artif. Intell..

[89]  Nicholas Kushmerick,et al.  An Algorithm for Probabilistic Planning , 1995, Artif. Intell..

[90]  Thomas G. Dietterich,et al.  Explanation-Based Learning and Reinforcement Learning: A Unified View , 1995, Machine-mediated learning.

[91]  Craig Boutilier,et al.  Process-Oriented Planning and Average-Reward Optimality , 1995, IJCAI.

[92]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[93]  Richard S. Sutton,et al.  TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.

[94]  David Poole,et al.  Exploiting the Rule Structure for Decision Making within the Independent Choice Logic , 1995, UAI.

[95]  Avrim Blum,et al.  Fast Planning Through Planning Graph Analysis , 1995, IJCAI.

[96]  Thomas Dean,et al.  Decomposition Techniques for Planning in Stochastic Domains , 1995, IJCAI.

[97]  Andrew McCallum,et al.  Instance-Based Utile Distinctions for Reinforcement Learning , 1995 .

[98]  Reid G. Simmons,et al.  Probabilistic Robot Navigation in Partially Observable Environments , 1995, IJCAI.

[99]  Stuart J. Russell,et al.  Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.

[100]  Leslie Pack Kaelbling,et al.  On the Complexity of Solving Markov Decision Problems , 1995, UAI.

[101]  Reid G. Simmons,et al.  Real-Time Search in Non-Deterministic Domains , 1995, IJCAI.

[102]  Craig Boutilier,et al.  Exploiting Structure in Policy Construction , 1995, IJCAI.

[103]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[104]  Andrew McCallum,et al.  Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State , 1995, ICML.

[105]  Yiannis Aloimonos,et al.  Artificial intelligence - theory and practice , 1995 .

[106]  Nevin Lianwen Zhang,et al.  Exploiting Causal Independence in Bayesian Network Inference , 1996, J. Artif. Intell. Res..

[107]  Craig Boutilier,et al.  Context-Specific Independence in Bayesian Networks , 1996, UAI.

[108]  Craig Boutilier,et al.  The Frame Problem and Bayesian Network Action Representation , 1996, Canadian Conference on AI.

[109]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[110]  우종우,et al.  [해외 연구소 소개]「The Institute for the Learning Sciences」 , 1996 .

[111]  Craig Boutilier,et al.  Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.

[112]  Fahiem Bacchus,et al.  Using temporal logic to control search in a forward chaining planner , 1996 .

[113]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[114]  S. Hanks,et al.  A value-directed approach to planning , 1996 .

[115]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[116]  T. Dean,et al.  Generating optimal policies for high-level plans with conditional branches and loops , 1996 .

[117]  Craig Boutilier,et al.  Approximate Value Trees in Structured Dynamic Programming , 1996, ICML.

[118]  Craig Boutilier,et al.  Rewarding Behaviors , 1996, AAAI/IAAI, Vol. 2.

[119]  Rina Dechter,et al.  Bucket elimination: A unifying framework for probabilistic inference , 1996, UAI.

[120]  Robert Givan,et al.  Model Minimization, Regression, and Propositional STRIPS Planning , 1997, IJCAI.

[121]  Craig Boutilier,et al.  Correlated Action Effects in Decision Theoretic Regression , 1997, UAI.

[122]  Rina Dechter,et al.  Mini-Buckets: A General Scheme for Generating Approximations in Automated Reasoning , 1997, IJCAI.

[123]  Ronen I. Brafman,et al.  Prioritized Goal Decomposition of Markov Decision Processes: Toward a Synthesis of Classical and Decision Theoretic Planning , 1997, IJCAI.

[124]  Wenju Liu,et al.  A Model Approximation Scheme for Planning in Partially Observable Stochastic Domains , 1997, J. Artif. Intell. Res..

[125]  David Poole,et al.  The Independent Choice Logic for Modelling Multiple Agents Under Uncertainty , 1997, Artif. Intell..

[126]  Satinder P. Singh,et al.  How to Dynamically Merge Markov Decision Processes , 1997, NIPS.

[127]  Qiang Yang,et al.  Intelligent planning - a decomposition and abstraction based approach , 1997, Artificial intelligence.

[128]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[129]  Subbarao Kambhampati,et al.  Refinement Planning as a Unifying Framework for Plan Synthesis , 1997, AI Mag..

[130]  Milos Hauskrecht,et al.  Planning and control in stochastic domains with imperfect information , 1997 .

[131]  Craig Boutilier,et al.  Abstraction and Approximate Decision-Theoretic Planning , 1997, Artif. Intell..

[132]  Robert Givan,et al.  Model Minimization in Markov Decision Processes , 1997, AAAI/IAAI.

[133]  Blai Bonet,et al.  A Robust and Fast Action Selection Mechanism for Planning , 1997, AAAI/IAAI.

[134]  David Poole,et al.  Probabilistic Partial Evaluation: Exploiting Rule Structure in Probabilistic Inference , 1997, IJCAI.

[135]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[136]  Shieu-Hong Lin,et al.  Exploiting structure for planning and control , 1997 .

[137]  Craig Boutilier,et al.  Structured Solution Methods for Non-Markovian Decision Processes , 1997, AAAI/IAAI.

[138]  Robert Givan,et al.  Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes , 1997, UAI.

[139]  Michael L. Littman,et al.  Probabilistic Propositional Planning: Representations and Complexity , 1997, AAAI/IAAI.

[140]  Ronen I. Brafman,et al.  A Heuristic Variable Grid Solution Method for POMDPs , 1997, AAAI/IAAI.

[141]  Milos Hauskrecht,et al.  Hierarchical Solution of Markov Decision Processes using Macro-actions , 1998, UAI.

[142]  Doina Precup,et al.  Theoretical Results on Reinforcement Learning with Temporally Abstract Options , 1998, ECML.

[143]  Wolfram Burgard,et al.  A Probabilistic Approach to Concurrent Mapping and Localization for Mobile Robots , 1998, Auton. Robots.

[144]  Shlomo Zilberstein,et al.  Heuristic Search in Cyclic AND/OR Graphs , 1998, AAAI/IAAI.

[145]  R. Sutton,et al.  Theoretical Results on Reinforcement Learning with Temporally Abstract Behaviors , 1998 .

[146]  David Poole,et al.  Context-specific approximation in probabilistic inference , 1998, UAI.

[147]  Kee-Eung Kim,et al.  Solving Stochastic Planning Problems with Large State and Action Spaces , 1998, AIPS.

[148]  Kee-Eung Kim,et al.  Solving Very Large Weakly Coupled Markov Decision Processes , 1998, AAAI/IAAI.

[149]  Blai Bonet,et al.  Learning Sorting and Decision Trees with POMDPs , 1998, ICML.

[150]  Yee Whye Teh,et al.  Making Forward Chaining Relevant , 1998, AIPS.

[151]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[152]  Michael L. Littman,et al.  The Computational Complexity of Probabilistic Planning , 1998, J. Artif. Intell. Res..

[153]  Peter Haddawy,et al.  Utility Models for Goal‐Directed, Decision‐Theoretic Planners , 1998, Comput. Intell..

[154]  Ronen I. Brafman,et al.  Structured Reachability Analysis for Markov Decision Processes , 1998, UAI.

[155]  Ronald Parr,et al.  Flexible Decomposition Algorithms for Weakly Coupled Markov Decision Problems , 1998, UAI.

[156]  Anne Condon,et al.  On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[157]  Jim Blythe,et al.  Decision-Theoretic Planning , 1999, AI Mag..

[158]  Manuela M. Veloso,et al.  Team-partitioned, opaque-transition reinforcement learning , 1999, AGENTS '99.

[159]  Jesse Hoey,et al.  SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.

[160]  Robert Givan,et al.  Bounded-parameter Markov decision processes , 2000, Artif. Intell..

[161]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[162]  Hiroaki Kitano,et al.  RoboCup-98: Robot Soccer World Cup II , 2001, Lecture Notes in Computer Science.