Symbolic Plans as High-Level Instructions for Reinforcement Learning

Reinforcement learning (RL) agents seek to maximize the cumulative reward obtained when interacting with their environment. Users define tasks or goals for RL agents by designing specialized reward functions such that maximization aligns with task satisfaction. This work explores the use of highlevel symbolic action models as a framework for defining final-state goal tasks and automatically producing their corresponding reward functions. We also show how automated planning can be used to synthesize high-level plans that can guide hierarchical RL (HRL) techniques towards efficiently learning adequate policies. We provide a formal characterization of taskable RL environments and describe sufficient conditions that guarantee we can satisfy various notions of optimality (e.g., minimize total cost, maximize probability of reaching the goal). In addition, we do an empirical evaluation that shows that our approach converges to near-optimal solutions faster than standard RL and HRL methods and that it provides an effective framework for transferring learned skills across multiple tasks in a given environment.

[1]  Lihong Li,et al.  Sample Complexity of Multi-task Reinforcement Learning , 2013, UAI.

[2]  Sheila A. McIlraith,et al.  Teaching Multiple Tasks to an RL Agent using LTL , 2018, AAMAS.

[3]  M. Grzes,et al.  Plan-based reward shaping for reinforcement learning , 2008, 2008 4th International IEEE Conference Intelligent Systems.

[4]  Sheila A. McIlraith,et al.  Learning Reward Machines for Partially Observable Reinforcement Learning , 2019, NeurIPS.

[5]  Sheila A. McIlraith,et al.  Monitoring Plan Optimality During Execution , 2007, ICAPS.

[6]  Sergey Levine,et al.  Search on the Replay Buffer: Bridging Planning and Reinforcement Learning , 2019, NeurIPS.

[7]  Sergey Levine,et al.  Planning with Goal-Conditioned Policies , 2019, NeurIPS.

[8]  Nils J. Nilsson,et al.  Teleo-Reactive Programs for Agent Control , 1993, J. Artif. Intell. Res..

[9]  Alberto Camacho,et al.  LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning , 2019, IJCAI.

[10]  Sergey Levine,et al.  Optimal control with learned local models: Application to dexterous manipulation , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Thomas G. Dietterich Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[12]  Pietro Falco,et al.  On Policy Learning Robust to Irreversible Events: An Application to Robotic In-Hand Manipulation , 2018, IEEE Robotics and Automation Letters.

[13]  Xi Yan,et al.  Symbolic Planning and Model-Free Reinforcement Learning: Training Taskable Agents , 2019 .

[14]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[15]  Sergey Levine,et al.  Learning Dexterous Manipulation Policies from Experience and Imitation , 2016, ArXiv.

[16]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[17]  Rob Fergus,et al.  Composable Planning with Attributes , 2018, ICML.

[18]  Leslie Pack Kaelbling,et al.  Learning to Achieve Goals , 1993, IJCAI.

[19]  Sheila A. McIlraith,et al.  Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning , 2018, ICML.

[20]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[21]  Christian J. Muise,et al.  Monitoring the Execution of Partial-Order Plans via Regression , 2011, IJCAI.

[22]  Jan Peters,et al.  Learning robot in-hand manipulation with tactile features , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[23]  Malcolm R. K. Ryan Using Abstract Models of Behaviours to Automatically Generate Reinforcement Learning Hierarchies , 2002, ICML.

[24]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[25]  Tom Schaul,et al.  Universal Value Function Approximators , 2015, ICML.

[26]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[27]  L. A. Zadeh,et al.  Optimal Pursuit Strategies in Discrete-State Probabilistic Systems , 1962 .

[28]  Marcin Andrychowicz,et al.  Hindsight Experience Replay , 2017, NIPS.

[29]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[30]  Andrew G. Barto,et al.  Building Portable Options: Skill Transfer in Reinforcement Learning , 2007, IJCAI.

[31]  Sheila A. McIlraith,et al.  Advice-Based Exploration in Model-Based Reinforcement Learning , 2018, Canadian Conference on AI.

[32]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[33]  Fangkai Yang,et al.  SDRL: Interpretable and Data-efficient Deep Reinforcement Learning Leveraging Symbolic Planning , 2018, AAAI.

[34]  Fangkai Yang,et al.  PEORL: Integrating Symbolic Planning and Hierarchical Reinforcement Learning for Robust Decision-Making , 2018, IJCAI.

[35]  John N. Tsitsiklis,et al.  An Analysis of Stochastic Shortest Path Problems , 1991, Math. Oper. Res..

[36]  Daniel Kudenko,et al.  Combining Reinforcement Learning with Symbolic Planning , 2007, Adaptive Agents and Multi-Agents Systems.