CAMPs: Learning Context-Specific Abstractions for Efficient Planning in Factored MDPs

Meta-planning, or learning to guide planning from experience, is a promising approach to improving the computational cost of planning. A general meta-planning strategy is to learn to impose constraints on the states considered and actions taken by the agent. We observe that (1) imposing a constraint can induce context-specific independences that render some aspects of the domain irrelevant, and (2) an agent can take advantage of this fact by imposing constraints on its own behavior. These observations lead us to propose the context-specific abstract Markov decision process (CAMP), an abstraction of a factored MDP that affords efficient planning. We then describe how to learn constraints to impose so the CAMP optimizes a trade-off between rewards and computational cost. Our experiments consider five planners across four domains, including robotic navigation among movable obstacles (NAMO), robotic task and motion planning for sequential manipulation, and classical planning. We find planning with learned CAMPs to consistently outperform baselines, including Stilman's NAMO-specific algorithm. Video: this https URL

[1]  Lydia E. Kavraki,et al.  Learning Feasibility for Task and Motion Planning in Tabletop Environments , 2019, IEEE Robotics and Automation Letters.

[2]  James J. Kuffner,et al.  Navigation among movable obstacles: real-time reasoning in complex environments , 2004, 4th IEEE/RAS International Conference on Humanoid Robots, 2004..

[3]  Nicolas Mansard,et al.  Learning Feasibility Constraints for Multicontact Locomotion of Legged Robots , 2017, Robotics: Science and Systems.

[4]  Jesse Hoey,et al.  SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.

[5]  Dylan Hadfield-Menell,et al.  Guided search for task and motion plans using learned heuristics , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Jung-Su Ha,et al.  Deep Visual Reasoning: Learning to Predict Action Sequences for Task and Motion Planning from an Initial Scene Image , 2020, Robotics: Science and Systems.

[7]  Andrew G. Barto,et al.  Efficient skill learning using abstraction selection , 2009, IJCAI 2009.

[8]  Craig Boutilier,et al.  Correlated Action Effects in Decision Theoretic Regression , 1997, UAI.

[9]  Leslie Pack Kaelbling,et al.  Relational envelope-based planning , 2008 .

[10]  Thomas J. Walsh,et al.  Towards a Unified Theory of State Abstraction for MDPs , 2006, AI&M.

[11]  Malte Helmert,et al.  The Fast Downward Planning System , 2006, J. Artif. Intell. Res..

[12]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[13]  Steven M. LaValle,et al.  RRT-connect: An efficient approach to single-query path planning , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[14]  David Poole,et al.  Context-specific approximation in probabilistic inference , 1998, UAI.

[15]  Joost Broekens,et al.  Think Too Fast Nor Too Slow: The Computational Trade-off Between Planning And Reinforcement Learning , 2020, ArXiv.

[16]  Jörg Hoffmann,et al.  FF: The Fast-Forward Planning System , 2001, AI Mag..

[17]  D. Bertsekas,et al.  Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .

[18]  Jung-Su Ha,et al.  Deep Visual Heuristics: Learning Feasibility of Mixed-Integer Programs for Manipulation Planning , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Leslie Pack Kaelbling,et al.  Guiding Search in Continuous State-Action Spaces by Learning an Action Sampler From Off-Target Search Experience , 2018, AAAI.

[20]  Nevin Lianwen Zhang,et al.  On the Role of Context-Specific Independence in Probabilistic Inference , 1999, IJCAI.

[21]  Peter Stone,et al.  State Abstraction Discovery from Irrelevant State Variables , 2005, IJCAI.

[22]  George Konidaris,et al.  Constructing Abstraction Hierarchies Using a Skill-Symbol Loop , 2015, IJCAI.

[23]  Doina Precup,et al.  Value Preserving State-Action Abstractions , 2020, AISTATS.

[24]  Trevor I. Dix,et al.  Proximity-Based Non-uniform Abstractions for Approximate Planning , 2014, J. Artif. Intell. Res..

[25]  Leslie Pack Kaelbling,et al.  Learning to guide task and motion planning using score-space representation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[26]  Carmel Domshlak,et al.  Efficient probabilistic reasoning in BNs with mutual exclusion and context‐specific independence , 2004, Int. J. Intell. Syst..

[27]  Nan Jiang,et al.  Abstraction Selection in Model-based Reinforcement Learning , 2015, ICML.

[28]  Craig Boutilier,et al.  Context-Specific Independence in Bayesian Networks , 1996, UAI.

[29]  Beomjoon Kim,et al.  Learning value functions with relational state representations for guiding task-and-motion planning , 2019, CoRL.

[30]  Shobha Venkataraman,et al.  Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..

[31]  Kurt Steinkraus,et al.  Solving large stochastic planning problems using multiple dynamic abstractions , 2005 .

[32]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[33]  Michael L. Littman,et al.  Near Optimal Behavior via Approximate State Abstraction , 2016, ICML.

[34]  Pieter Abbeel,et al.  Combined task and motion planning through an extensible planner-independent interface layer , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[35]  Craig A. Knoblock,et al.  PDDL-the planning domain definition language , 1998 .

[36]  Robert L. Smith,et al.  Aggregation in Dynamic Programming , 1987, Oper. Res..

[37]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[38]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.