论文信息 - Decision-Theoretic Planning: Structural Assumptions and Computational Leverage

Decision-Theoretic Planning: Structural Assumptions and Computational Leverage

Planning under uncertainty is a central problem in the study of automated sequential decision making, and has been addressed by researchers in many different fields, including AI planning, decision analysis, operations research, control theory and economics. While the assumptions and perspectives adopted in these areas often differ in substantial ways, many planning problems of interest to researchers in these fields can be modeled as Markov decision processes (MDPs) and analyzed using the techniques of decision theory. This paper presents an overview and synthesis of MDP-related methods, showing how they provide a unifying framework for modeling many classes of planning problems studied in AI. It also describes structural properties of MDPs that, when exhibited by particular classes of problems, can be exploited in the construction of optimal or approximately optimal policies or plans. Planning problems commonly possess structure in the reward and value functions used to describe performance criteria, in the functions used to describe state transitions and observations, and in the relationships among features used to describe states, actions, rewards, and observations. Specialized representations, and algorithms employing these representations, can achieve computational leverage by exploiting these various forms of structure. Certain AI techniques-- in particular those based on the use of structured, intensional representations--can be viewed in this way. This paper surveys several types of representations for both classical and decision-theoretic planning problems, and planning algorithms that exploit these representations in a number of different ways to ease the computational burden of constructing policies or plans. It focuses primarily on abstraction, aggregation and decomposition techniques based on AI-style representations.

[1] George B. Dantzig,et al. Decomposition Principle for Linear Programs , 1960 .

[2] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .

[3] Stephen J. Garland,et al. Algorithm 97: Shortest path , 1962, Commun. ACM.

[4] F. d'Epenoux,et al. A Probabilistic Production and Inventory Problem , 1963 .

[5] Karl Johan Åström,et al. Optimal control of Markov processes with incomplete state information , 1965 .

[6] Rutherford Aris,et al. Discrete Dynamic Programming , 1965, The Mathematical Gazette.

[7] P. Graefe. Linear stochastic systems , 1966 .

[8] Bennett L. Fox,et al. Scientific Applications: An algorithm for identifying the ergodic subchains and transient states of a stochastic matrix , 1967, Commun. ACM.

[9] D. J. White,et al. Decision Theory , 2018, Behavioral Finance for Private Banking.

[10] E. J. Sondik,et al. The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[11] Richard Fikes,et al. STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving , 1971, IJCAI.

[12] Richard Fikes,et al. Learning and Executing Generalized Robot Plans , 1993, Artif. Intell..

[13] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[14] Earl D. Sacerdoti,et al. Planning in a Hierarchy of Abstraction Spaces , 1974, IJCAI.

[15] E. Polak. Introduction to linear and nonlinear programming , 1973 .

[16] H. Kushner,et al. Decomposition of systems governed by Markov chains , 1974 .

[17] Earl D. Sacerdoti,et al. The Nonlinear Nature of Plans , 1975, IJCAI.

[18] David H. D. Warren,et al. Generating Conditional Plans and Programs , 1976, AISB.

[19] John B. Kidd,et al. Decisions with Multiple Objectives—Preferences and Value Tradeoffs , 1977 .

[20] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .

[21] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[22] R. L. Keeney,et al. Decisions with Multiple Objectives: Preferences and Value Trade-Offs , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[23] Introduction to dynamic systems: Theory, models and application , 1980, Proceedings of the IEEE.

[24] F. Fairman. Introduction to dynamic systems: Theory, models and applications , 1979, Proceedings of the IEEE.

[25] John McCarthy,et al. SOME PHILOSOPHICAL PROBLEMS FROM THE STANDPOINT OF ARTI CIAL INTELLIGENCE , 1987 .

[26] Jan Telgen,et al. Stochastic Dynamic Programming , 1982 .

[27] Richard E. Korf,et al. Macro-Operators: A Weak Method for Learning , 1985, Artif. Intell..

[28] Paul J. Schweitzer,et al. Iterative Aggregation-Disaggregation Procedures for Discounted Semi-Markov Reward Processes , 1985, Oper. Res..

[29] Editors , 1986, Brain Research Bulletin.

[30] Randal E. Bryant,et al. Graph-Based Algorithms for Boolean Function Manipulation , 1986, IEEE Transactions on Computers.

[31] Ross D. Shachter. Evaluating Influence Diagrams , 1986, Oper. Res..

[32] Some philosophical problems from the standpoint of ai , 1987 .

[33] David Chapman,et al. Planning for Conjunctive Goals , 1987, Artif. Intell..

[34] J. Finger,et al. Exploiting constraints in design synthesis , 1987 .

[35] Marcel Schoppers,et al. Universal Plans for Reactive Robots in Unpredictable Environments , 1987, IJCAI.

[36] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[37] Chelsea C. White,et al. Solution Procedures for Partially Observed Markov Decision Processes , 1989, Oper. Res..

[38] Edwin P. D. Pednault,et al. ADL: Exploring the Middle Ground Between STRIPS and the Situation Calculus , 1989, KR.

[39] Judea Pearl,et al. Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[40] D. Bertsekas,et al. Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .

[41] Keiji Kanazawa,et al. A model for reasoning about persistence and causation , 1989 .

[42] Richard E. Korf,et al. Real-Time Heuristic Search , 1990, Artif. Intell..

[43] Ross D. Shachter,et al. Dynamic programming and influence diagrams , 1990, IEEE Trans. Syst. Man Cybern..

[44] Drew V. Mcdermott,et al. Projecting plans for uncertain worlds , 1990 .

[45] Franz Josef Radermacher,et al. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference (Judea Pearl) , 1990, SIAM Rev..

[46] David A. McAllester,et al. Systematic Nonlinear Planning , 1991, AAAI.

[47] Keith W. Ross,et al. Multichain Markov Decision Processes with a Sample Path Constraint: A Decomposition Approach , 1991, Math. Oper. Res..

[48] Leslie Pack Kaelbling,et al. Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[49] Andrew B. Baker,et al. Nonmonotonic Reasoning in the Framework of Situation Calculus , 1991, Artif. Intell..

[50] David Heckerman,et al. Advances in Probabilistic Reasoning , 1994, Conference on Uncertainty in Artificial Intelligence.

[51] William S. Lovejoy,et al. Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..

[52] Qiang Yang,et al. Characterizing Abstraction Hierarchies for Planning , 1991, AAAI.

[53] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .

[54] Michael P. Wellman,et al. Planning and Control , 1991 .

[55] David Lee,et al. Online minimization of transition systems (extended abstract) , 1992, STOC '92.

[56] Sven Koenig,et al. Optimal Probabilistic and Decision-Theoretic Planning using Markovian , 1992 .

[57] Mark A. Peot,et al. Conditional nonlinear planning , 1992 .

[58] Daniel S. Weld,et al. UCPOP: A Sound, Complete, Partial Order Planner for ADL , 1992, KR.

[59] Oren Etzioni,et al. An Approach to Planning with Incomplete Information , 1992, KR.

[60] Uffe Kjærulff,et al. A Computational Scheme for Reasoning in Dynamic Probabilistic Networks , 1992, UAI.

[61] Solomon Eyal Shimony,et al. The role of relevance in explanation I: Irrelevance as statistical independence , 1993, Int. J. Approx. Reason..

[62] Leslie Pack Kaelbling,et al. Planning With Deadlines in Stochastic Domains , 1993, AAAI.

[63] Mark A. Peot,et al. Postponing Threats in Partial-Order Planning , 1993, AAAI.

[64] Craig A. Knoblock. Generating abstraction hierarchies - an automated approach to reducing search in planning , 1993, The Kluwer international series in engineering and computer science.

[65] Enrico Macii,et al. Algebraic decision diagrams and their applications , 1993, Proceedings of 1993 International Conference on Computer Aided Design (ICCAD).

[66] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[67] Craig Boutilier,et al. Using Abstractions for Decision-Theoretic Planning with Time Constraints , 1994, AAAI.

[68] Stuart J. Russell,et al. Control Strategies for a Stochastic Planner , 1994, AAAI.

[69] Jaime G. Carbonell,et al. Control Knowledge to Improve Plan Quality , 1994, AIPS.

[70] Daniel S. Weld,et al. A Probablistic Model of Action for Least-Commitment Planning with Information Gathering , 1994, UAI.

[71] Steve Hanks,et al. Optimal Planning with a Goal-directed Utility Model , 1994, AIPS.

[72] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[73] Leslie Pack Kaelbling,et al. Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[74] Sridhar Mahadevan,et al. To Discount or Not to Discount in Reinforcement Learning: A Case Study Comparing R Learning and Q Learning , 1994, ICML.

[75] Peter Haddawy,et al. Decision-theoretic Refinement Planning Using Inheritance Abstraction , 1994, AIPS.

[76] Daniel S. Weld,et al. Probabilistic Planning with Information Gathering and Contingent Execution , 1994, AIPS.

[77] James A. Hendler,et al. Readings in Planning , 1994 .