Value Iteration with Options and State Aggregation

This paper presents a way of solving Markov Decision Processes that combines state abstraction and temporal abstraction. Specifically, we combine state aggregation with the options framework and demonstrate that they work well together and indeed it is only after one combines the two that the full benefit of each is realized. We introduce a hierarchical value iteration algorithm where we first coarsely solve subgoals and then use these approximate solutions to exactly solve the MDP. This algorithm solved several problems faster than vanilla value iteration.

[1]  W. W. Johnson,et al.  Notes on the "15" Puzzle , 1879 .

[2]  R. Korf Learning to solve problems by searching for macro-operators , 1983 .

[3]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[4]  Solomon Eyal Shimony,et al.  A new algorithm for finding MAP assignments to belief networks , 1990, UAI.

[5]  Baruch Schieber,et al.  The Canadian Traveller Problem , 1991, SODA '91.

[6]  Stuart J. Russell,et al.  Do the right thing - studies in limited rationality , 1991 .

[7]  Mihalis Yannakakis,et al.  Shortest Paths Without a Map , 1989, Theor. Comput. Sci..

[8]  Alexander Reinefeld,et al.  Complete Solution of the Eight-Puzzle and the Benefit of Node Ordering in IDA , 1993, IJCAI.

[9]  James A. Hendler,et al.  HTN Planning: Complexity and Expressivity , 1994, AAAI.

[10]  Richard S. Sutton,et al.  TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.

[11]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[12]  Robert C. Holte,et al.  Hierarchical A*: Searching Abstraction Hierarchies Efficiently , 1996, AAAI/IAAI, Vol. 1.

[13]  Doina Precup,et al.  Theoretical Results on Reinforcement Learning with Temporally Abstract Options , 1998, ECML.

[14]  Thomas G. Dietterich The MAXQ Method for Hierarchical Reinforcement Learning , 1998, ICML.

[15]  Michael I. Jordan Graphical Models , 1998 .

[16]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[17]  Roni Khardon,et al.  Learning Action Strategies for Planning Domains , 1999, Artif. Intell..

[18]  J. W. Nieuwenhuis,et al.  Boekbespreking van D.P. Bertsekas (ed.), Dynamic programming and optimal control - volume 2 , 1999 .

[19]  David W. Aha,et al.  HICAP: An Interactive Case-Based Planning Architecture and its Application to Noncombatant Evacuation Operations , 1999, AAAI/IAAI.

[20]  Hector Geffner,et al.  Learning Generalized Policies in Planning Using Concept Languages , 2000, KR.

[21]  Andrew G. Barto,et al.  Automated State Abstraction for Options using the U-Tree Algorithm , 2000, NIPS.

[22]  Benjamin Van Roy,et al.  On the existence of fixed points for approximate value iteration and temporal-difference learning , 2000 .

[23]  Blai Bonet,et al.  Planning as heuristic search , 2001, Artif. Intell..

[24]  David Andre,et al.  State abstraction for programmable reinforcement learning agents , 2002, AAAI/IAAI.

[25]  Bernhard Hengst,et al.  Discovering Hierarchy in Reinforcement Learning with HEXQ , 2002, ICML.

[26]  Dana S. Nau,et al.  SHOP2: An HTN Planning System , 2003, J. Artif. Intell. Res..

[27]  John Levine,et al.  Learning Action Strategies for Planning Domains Using Genetic Programming , 2003, EvoWorkshops.

[28]  James A. Hendler,et al.  Automating DAML-S Web Services Composition Using SHOP2 , 2003, SEMWEB.

[29]  Wu,et al.  A Playbook Approach to Variable Autonomy Control: Application for Control of Multiple, Heterogeneous Unmanned Air Vehicles , 2004 .

[30]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[31]  Jonathan Schaeffer,et al.  Macro-FF: Improving AI Planning with Automatically Learned Macro-Operators , 2005, J. Artif. Intell. Res..

[32]  Patrik Haslum,et al.  New Admissible Heuristics for Domain-Independent Planning , 2005, AAAI.

[33]  Benjamin Van Roy TD(0) Leads to Better Policies than Approximate Value Iteration , 2005, NIPS.

[34]  Peter Stone,et al.  State Abstraction Discovery from Irrelevant State Variables , 2005, IJCAI.

[35]  Malte Helmert,et al.  The Fast Downward Planning System , 2006, J. Artif. Intell. Res..

[36]  Changhe Yuan,et al.  Dynamic Weighting A* Search-based MAP Algorithm for Bayesian Networks , 2006, Probabilistic Graphical Models.

[37]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[38]  Andrew G. Barto,et al.  Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[39]  John Levine,et al.  Learning Macro-Actions for Arbitrary Planners and Domains , 2007, ICAPS.

[40]  Pavol Návrat,et al.  Expressivity of STRIPS-Like and HTN-Like Planning , 2007, KES-AMSTA.

[41]  Stuart J. Russell,et al.  Angelic Hierarchical Planning: Optimal and Online Algorithms , 2008, ICAPS.

[42]  Yixin Chen,et al.  DTG-Plan : Fast Planning by Search in Domain Transition Graphs , 2008 .

[43]  Lihong Li,et al.  An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.

[44]  David R. Karger,et al.  Route Planning under Uncertainty: The Canadian Traveller Problem , 2008, AAAI.

[45]  Joshua B. Tenenbaum,et al.  Church: a language for generative models , 2008, UAI.

[46]  Robert Givan,et al.  Learning Control Knowledge for Forward Search Planning , 2008, J. Mach. Learn. Res..

[47]  Hector Muñoz-Avila,et al.  HTN-MAKER: Learning HTNs with Minimal Additional Knowledge Engineering Required , 2008, AAAI.

[48]  R. Ziff,et al.  Percolation thresholds on two-dimensional Voronoi networks and Delaunay triangulations. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[49]  Hector Geffner,et al.  Inference and Decomposition in Planning Using Causal Consistent Chains , 2009, ICAPS.

[50]  Jorge A. Baier,et al.  Exploiting N-Gram Analysis to Predict Operator Sequences , 2009, ICAPS.

[51]  Dana S. Nau,et al.  Translating HTNs to PDDL: A Small Amount of Domain Knowledge Can Go a Long Way , 2009, IJCAI.

[52]  Hector Muñoz-Avila,et al.  Learning HTN Method Preconditions and Action Models from Partial Observations , 2009, IJCAI.

[53]  Solomon Eyal Shimony,et al.  Canadian traveler problem with remote sensing , 2009, IJCAI 2009.

[54]  Malte Helmert,et al.  Concise finite-domain representations for PDDL planning tasks , 2009, Artif. Intell..

[55]  Satinder P. Singh,et al.  Linear options , 2010, AAMAS.

[56]  James MacGlashan Hierarchical Skill Learning for High-Level Planning , 2010, AAAI.

[57]  Malte Helmert,et al.  High-Quality Policies for the Canadian Traveler's Problem , 2010, SOCS.

[58]  Joseph T. McGuire,et al.  A Neural Signature of Hierarchical Reinforcement Learning , 2011, Neuron.

[59]  Raquel Fuentetaja,et al.  Scaling up Heuristic Planning with Relational Decision Trees , 2014, J. Artif. Intell. Res..

[60]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[61]  Noah D. Goodman,et al.  Lightweight Implementations of Probabilistic Programming Languages Via Transformational Compilation , 2011, AISTATS.

[62]  Hector Geffner,et al.  Searching for Plans with Carefully Designed Probes , 2011, ICAPS.

[63]  Dan Lizotte,et al.  Convergent Fitted Value Iteration with Linear Function Approximation , 2011, NIPS.

[64]  Pascal Bercher,et al.  On the Decidability of HTN Planning with Task Insertion , 2011, IJCAI.

[65]  Susanne Biundo-Stephan,et al.  Improving Hierarchical Planning Performance by the Use of Landmarks , 2012, AAAI.

[66]  Hongbing Wang,et al.  Automatic Discovery and Transfer of MAXQ Hierarchies in a Complex System , 2012, 2012 IEEE 24th International Conference on Tools with Artificial Intelligence.

[67]  David Silver,et al.  Compositional Planning Using Optimal Option Models , 2012, ICML.

[68]  Juan Fernández-Olivares,et al.  An Approach for Representing and Managing Medical Exceptions in Care Pathways Based on Temporal Hierarchical Planning Techniques , 2012, ProHealth/KR4HC.

[69]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[70]  David Tolpin,et al.  Selecting Computations: Theory and Applications , 2012, UAI.

[71]  Solomon Eyal Shimony,et al.  Complexity of Canadian Traveler Problem Variants , 2012, Theor. Comput. Sci..

[72]  Juan Fernández-Olivares,et al.  From business process models to hierarchical task network planning domains , 2013, The Knowledge Engineering Review.

[73]  Tristan Cazenave,et al.  Planning and Execution Control Architecture for Infantry Serious Gaming , 2013 .

[74]  Frank D. Wood,et al.  A New Approach to Probabilistic Programming Inference , 2014, AISTATS.

[75]  Thomas A. Henzinger,et al.  Probabilistic programming , 2014, FOSE.

[76]  Rémi Munos,et al.  Optimistic Planning in Markov Decision Processes Using a Generative Model , 2014, NIPS.

[77]  Yura N. Perov,et al.  Venture: a higher-order probabilistic programming platform with programmable inference , 2014, ArXiv.

[78]  Yee Whye Teh,et al.  Asynchronous Anytime Sequential Monte Carlo , 2014, NIPS.