A Human-Centered Data-Driven Planner-Actor-Critic Architecture via Logic Programming

Recent successes of Reinforcement Learning (RL) allow an agent to learn policies that surpass human experts but suffers from being time-hungry and data-hungry. By contrast, human learning is significantly faster because prior and general knowledge and multiple information resources are utilized. In this paper, we propose a Planner-Actor-Critic architecture for huMAN-centered planning and learning (PACMAN), where an agent uses its prior, high-level, deterministic symbolic knowledge to plan for goal-directed actions, and also integrates the Actor-Critic algorithm of RL to fine-tune its behavior towards both environmental rewards and human feedback. This work is the first unified framework where knowledge-based planning, RL, and human teaching jointly contribute to the policy learning of an agent. Our experiments demonstrate that PACMAN leads to a significant jump-start at the early stage of learning, converges rapidly and with small variance, and is robust to inconsistent, infrequent, and misleading feedback.

[1]  Andrea Lockerd Thomaz,et al.  Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.

[2]  John McCarthy,et al.  Generality in artificial intelligence , 1987, Resonance.

[3]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[4]  Fangkai Yang,et al.  SDRL: Interpretable and Data-efficient Deep Reinforcement Learning Leveraging Symbolic Planning , 2018, AAAI.

[5]  Peter Stone,et al.  A synthesis of automated planning and reinforcement learning for efficient, robust decision-making , 2016, Artif. Intell..

[6]  Andrea Lockerd Thomaz,et al.  Teachable robots: Understanding human teaching behavior to build more effective robot learners , 2008, Artif. Intell..

[7]  Matthew Crosby,et al.  Association for the Advancement of Artificial Intelligence , 2014 .

[8]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[9]  Andrea Lockerd Thomaz,et al.  Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.

[10]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[11]  Hector Muñoz-Avila,et al.  Learning Methods to Generate Good Plans: Integrating HTN Learning and Reinforcement Learning , 2010, AAAI.

[12]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[13]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[14]  Fangkai Yang,et al.  Mobile Robot Planning Using Action Language BC with an Abstraction Hierarchy , 2015, LPNMR.

[15]  Anton Schwartz,et al.  A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[16]  Michael Gelfond,et al.  Some properties of system descriptions of , 2013, J. Appl. Non Class. Logics.

[17]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[18]  Fangkai Yang,et al.  Planning in Action Language BC while Learning Action Costs for Mobile Robots , 2014, ICAPS.

[19]  Mark D. Pendrith,et al.  RL-TOPS: An Architecture for Modularity and Re-Use in Reinforcement Learning , 1998, ICML.

[20]  Marc Hanheide,et al.  Robot task planning and explanation in open and uncertain worlds , 2017, Artif. Intell..

[21]  Joohyung Lee,et al.  Action Language BC+: Preliminary Report , 2015, AAAI.

[22]  Stephanie Rosenthal,et al.  Human-centered planning for effective tast autonomy , 2012 .

[23]  Shalabh Bhatnagar,et al.  Natural actor-critic algorithms , 2009, Autom..

[24]  Fangkai Yang,et al.  PEORL: Integrating Symbolic Planning and Hierarchical Reinforcement Learning for Robust Decision-Making , 2018, IJCAI.

[25]  Peter Stone,et al.  Reinforcement learning from simultaneous human and MDP reward , 2012, AAMAS.

[26]  Michael Gelfond,et al.  Modular Action Language ALM , 2015, ArXiv.

[27]  Peter Stone,et al.  Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.

[28]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[29]  Jake K. Aggarwal,et al.  BWIBots: A platform for bridging the gap between AI and human–robot interaction research , 2017, Int. J. Robotics Res..

[30]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[31]  Malcolm R. K. Ryan Using Abstract Models of Behaviours to Automatically Generate Reinforcement Learning Hierarchies , 2002, ICML.

[32]  Peter Stone,et al.  Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[33]  Sridhar Mahadevan,et al.  Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[34]  Marco Pistore,et al.  Handbook of Knowledge Representation Edited Automated Planning , 2022 .

[35]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[36]  Malte Helmert,et al.  The Fast Downward Planning System , 2006, J. Artif. Intell. Res..

[37]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[38]  Vladimir Lifschitz,et al.  A library of general-purpose action descriptions , 2008 .

[39]  Guan Wang,et al.  Interactive Learning from Policy-Dependent Human Feedback , 2017, ICML.

[40]  Thomas Dean,et al.  Automated planning , 1996, CSUR.

[41]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[42]  Vladimir Lifschitz,et al.  A Modular Action Description Language , 2006, AAAI.

[43]  Amir Pnueli,et al.  A Platform for Combining Deductive with Algorithmic Verification , 1996, CAV.

[44]  Joshua B. Tenenbaum,et al.  Human Learning in Atari , 2017, AAAI Spring Symposia.

[45]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[46]  Stephanie Rosenthal,et al.  Learning Accuracy and Availability of Humans Who Help Mobile Robots , 2011, AAAI.

[47]  Luca Iocchi,et al.  Automatic Generation and Learning of Finite-State Controllers , 2012, AIMSA.

[48]  Fangkai Yang,et al.  Task-Motion Planning with Reinforcement Learning for Adaptable Mobile Service Robots , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).