PEORL: Integrating Symbolic Planning and Hierarchical Reinforcement Learning for Robust Decision-Making

Reinforcement learning and symbolic planning have both been used to build intelligent autonomous agents. Reinforcement learning relies on learning from interactions with real world, which often requires an unfeasibly large amount of experience. Symbolic planning relies on manually crafted symbolic knowledge, which may not be robust to domain uncertainties and changes. In this paper we present a unified framework {\em PEORL} that integrates symbolic planning with hierarchical reinforcement learning (HRL) to cope with decision-making in a dynamic environment with uncertainties. Symbolic plans are used to guide the agent's task execution and learning, and the learned experience is fed back to symbolic knowledge to improve planning. This method leads to rapid policy search and robust symbolic plans in complex domains. The framework is tested on benchmark domains of HRL.

[1]  Torsten Schaub,et al.  Clingcon: The next generation* , 2017, Theory and Practice of Logic Programming.

[2]  Hector Muñoz-Avila,et al.  Learning Methods to Generate Good Plans: Integrating HTN Learning and Reinforcement Learning , 2010, AAAI.

[3]  Joohyung Lee,et al.  Action Language BC+: Preliminary Report , 2015, AAAI.

[4]  Marc Hanheide,et al.  Robot task planning and explanation in open and uncertain worlds , 2017, Artif. Intell..

[5]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[6]  Luca Iocchi,et al.  Automatic Generation and Learning of Finite-State Controllers , 2012, AIMSA.

[7]  Wolfgang Faber Answer Set Programming , 2013, Reasoning Web.

[8]  Kai Chen,et al.  Planning with Task-Oriented Knowledge Acquisition for a Service Robot , 2016, IJCAI.

[9]  Thomas Dean,et al.  Automated planning , 1996, CSUR.

[10]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[11]  Vladimir Lifschitz,et al.  Answer Set Programming , 2019 .

[12]  Shie Mannor,et al.  Dynamic abstraction in reinforcement learning via clustering , 2004, ICML.

[13]  Fangkai Yang,et al.  Planning in Action Language BC while Learning Action Costs for Mobile Robots , 2014, ICAPS.

[14]  Jake K. Aggarwal,et al.  BWIBots: A platform for bridging the gap between AI and human–robot interaction research , 2017, Int. J. Robotics Res..

[15]  Peter Stone,et al.  Dynamically Constructed (PO)MDPs for Adaptive Robot Planning , 2017, AAAI.

[16]  Sridhar Mahadevan,et al.  Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.

[17]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[18]  Anton Schwartz,et al.  A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[19]  Ali Farhadi,et al.  Visual Semantic Planning Using Deep Successor Representations , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[21]  Michael Gelfond,et al.  Action Languages , 1998, Electron. Trans. Artif. Intell..

[22]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[23]  Marco Pistore,et al.  Handbook of Knowledge Representation Edited Automated Planning , 2022 .

[24]  Scott Niekum,et al.  Clustering via Dirichlet Process Mixture Models for Portable Skill Discovery , 2011, Lifelong Learning.

[25]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[26]  Sergio Jiménez Celorrio,et al.  INTEGRATING PLANNING, EXECUTION, AND LEARNING TO IMPROVE PLAN EXECUTION , 2013, Comput. Intell..

[27]  John N. Tsitsiklis,et al.  On Average Versus Discounted Reward Temporal-Difference Learning , 2002, Machine Learning.

[28]  Malte Helmert,et al.  The Fast Downward Planning System , 2006, J. Artif. Intell. Res..

[29]  Mark D. Pendrith,et al.  RL-TOPS: An Architecture for Modularity and Re-Use in Reinforcement Learning , 1998, ICML.

[30]  Peter Stone,et al.  A synthesis of automated planning and reinforcement learning for efficient, robust decision-making , 2016, Artif. Intell..

[31]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[32]  John McCarthy,et al.  SOME PHILOSOPHICAL PROBLEMS FROM THE STANDPOINT OF ARTI CIAL INTELLIGENCE , 1987 .

[33]  Martin Gebser,et al.  Conflict-driven answer set solving: From theory to practice , 2012, Artif. Intell..

[34]  Malcolm R. K. Ryan Using Abstract Models of Behaviours to Automatically Generate Reinforcement Learning Hierarchies , 2002, ICML.

[35]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[36]  Marlos C. Machado,et al.  A Laplacian Framework for Option Discovery in Reinforcement Learning , 2017, ICML.