Intensional dynamic programming. A Rosetta stone for structured dynamic programming

We present intensional dynamic programming (IDP), a generic framework for structured dynamic programming over atomic, propositional and relational representations of states and actions. We first develop set-based dynamic programming and show its equivalence with classical dynamic programming. We then show how to describe state sets intensionally using any form of structured knowledge representation and obtain a generic algorithm that can optimally solve large, even infinite, MDPs without explicit state space enumeration. We derive two new Bellman backup operators and algorithms. In order to support the view of IDP as a Rosetta stone for structured dynamic programming, we review many existing techniques that employ either propositional or relational knowledge representation frameworks.

[1]  Blai Bonet High-Level Planning and Control with Incomplete Information Using POMDPs Hdctor Geffner and , 2003 .

[2]  Edwin P. D. Pednault,et al.  ADL: Exploring the Middle Ground Between STRIPS and the Situation Calculus , 1989, KR.

[3]  Michael A. Erdmann,et al.  Using Backprojections for Fine Motion Planning with Uncertainty , 1985, Proceedings. 1985 IEEE International Conference on Robotics and Automation.

[4]  Georg Gottlob,et al.  Removing Redundancy from a Clause , 1993, Artif. Intell..

[5]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[6]  Steffen Hölldobler,et al.  A Logic-based Approach to Dynamic Programming , 2004 .

[7]  E. Karabaev,et al.  Efficient Symbolic Reasoning for First-Order MDPs , 2006 .

[8]  Scott Sanner,et al.  Approximate Solution Techniques for Factored First-Order MDPs , 2007, ICAPS.

[9]  Craig Boutilier,et al.  Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.

[10]  David Poole,et al.  The Independent Choice Logic for Modelling Multiple Agents Under Uncertainty , 1997, Artif. Intell..

[11]  Reid G. Simmons,et al.  Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.

[12]  David Poole,et al.  Probabilistic Partial Evaluation: Exploiting Rule Structure in Probabilistic Inference , 1997, IJCAI.

[13]  Nils J. Nilsson,et al.  Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Jesse Hoey,et al.  SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.

[15]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[16]  Zohar Manna,et al.  The synthesis of structure-changing programs , 1978, ICSE '78.

[17]  R. Dearden Structured Prioritized Sweeping , 2022 .

[18]  Henri E. Bal,et al.  Solving awari with parallel retrograde analysis , 2003, Computer.

[19]  D. Aberdeen,et al.  A ( Revised ) Survey of Approximate Methods for Solving Partially Observable Markov Decision Processes , 2003 .

[20]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[21]  Wray L. Buntine Generalized Subsumption and Its Applications to Induction and Redundancy , 1986, Artif. Intell..

[22]  Zhengzhu Feng,et al.  Dynamic Programming for Structured Continuous Markov Decision Problems , 2004, UAI.

[23]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[24]  Matthijs T. J. Spaan,et al.  Approximate planning under uncertainty in partially observable environments , 2002 .

[25]  Thomas Ellman,et al.  Explanation-based learning: a survey of programs and perspectives , 1989, CSUR.

[26]  Oren Etzioni,et al.  Explanation-Based Learning: A Problem Solving Perspective , 1989, Artif. Intell..

[27]  Roni Khardon,et al.  Policy Iteration for Relational MDPs , 2007, UAI.

[28]  Alex M. Andrew,et al.  Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems , 2002 .

[29]  Alex S. Taylor,et al.  Machine intelligence , 2009, CHI.

[30]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[31]  Craig Boutilier,et al.  Symbolic Dynamic Programming for First-Order MDPs , 2001, IJCAI.

[32]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[33]  Martijn van Otterlo,et al.  The Logic of Adaptive Behavior - Knowledge Representation and Algorithms for Adaptive Sequential Decision Making under Uncertainty in First-Order and Relational Domains , 2009, Frontiers in Artificial Intelligence and Applications.

[34]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[35]  Jan Friso Groote,et al.  Binary decision diagrams for first-order predicate logic , 2003, J. Log. Algebraic Methods Program..

[36]  Roni Khardon,et al.  First Order Decision Diagrams for Relational MDPs , 2007, IJCAI.

[37]  Paul E. Utgoff,et al.  Explaining Temporal Differences to Create Useful Concepts for Evaluating States , 1990, AAAI.

[38]  Milind Tambe,et al.  A Fast Analytical Algorithm for MDPs with Continuous State Spaces , 2006 .

[39]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[40]  Luc De Raedt,et al.  Bellman goes relational , 2004, ICML.

[41]  Luc De Raedt,et al.  Probabilistic Explanation Based Learning , 2007, ECML.

[42]  R. Khardon,et al.  First order markov decision processes , 2007 .

[43]  Kee-Eung Kim,et al.  Solving Stochastic Planning Problems with Large State and Action Spaces , 1998, AIPS.

[44]  Leslie Pack Kaelbling,et al.  On the Complexity of Solving Markov Decision Problems , 1995, UAI.

[45]  Richard Dearden,et al.  Structured Prioritised Sweeping , 2001, ICML.

[46]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[47]  Edsger W. Dijkstra,et al.  Guarded commands, nondeterminacy and formal derivation of programs , 1975, Commun. ACM.

[48]  Richard Waldinger,et al.  Achieving several goals simultaneously , 1977 .

[49]  Michael Thielscher,et al.  Introduction to the Fluent Calculus , 1998, Electron. Trans. Artif. Intell..

[50]  Alan Bundy,et al.  Explanation-Based Generalisation = Partial Evaluation , 1988, Artif. Intell..

[51]  Michael I. Jordan,et al.  Reinforcement Learning with Soft State Aggregation , 1994, NIPS.

[52]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[53]  Robert Givan,et al.  Equivalence notions and model minimization in Markov decision processes , 2003, Artif. Intell..

[54]  H. Bal,et al.  Solving the Game of Awari using Parallel Retrograde Analysis , 2003 .

[55]  Patrick Henry Winston,et al.  The psychology of computer vision , 1976, Pattern Recognit..

[56]  Chenggang Wang,et al.  Planning with POMDPs using a compact, logic-based representation , 2005, 17th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'05).

[57]  Thomas G. Dietterich,et al.  Explanation-Based Learning and Reinforcement Learning: A Unified View , 1995, Machine-mediated learning.

[58]  Russell H. Taylor,et al.  Automatic Synthesis of Fine-Motion Strategies for Robots , 1984 .

[59]  Lihong Li,et al.  Lazy Approximation for Solving Continuous Finite-Horizon MDPs , 2005, AAAI.

[60]  R. Rivest Learning Decision Lists , 1987, Machine Learning.

[61]  Robert C. Berwick,et al.  Learning Structural Descriptions of Grammar Rules from Examples , 1979, IJCAI.