论文信息 - A Human-Centered Data-Driven Planner-Actor-Critic Architecture via Logic Programming - 字舞流文

A Human-Centered Data-Driven Planner-Actor-Critic Architecture via Logic Programming

Recent successes of Reinforcement Learning (RL) allow an agent to learn policies that surpass human experts but suffers from being time-hungry and data-hungry. By contrast, human learning is significantly faster because prior and general knowledge and multiple information resources are utilized. In this paper, we propose a Planner-Actor-Critic architecture for huMAN-centered planning and learning (PACMAN), where an agent uses its prior, high-level, deterministic symbolic knowledge to plan for goal-directed actions, and also integrates the Actor-Critic algorithm of RL to fine-tune its behavior towards both environmental rewards and human feedback. This work is the first unified framework where knowledge-based planning, RL, and human teaching jointly contribute to the policy learning of an agent. Our experiments demonstrate that PACMAN leads to a significant jump-start at the early stage of learning, converges rapidly and with small variance, and is robust to inconsistent, infrequent, and misleading feedback.

Fangkai Yang | Steven M. Gustafson | Daoming Lyu | Bo Liu | Steven Gustafson | Fangkai Yang | Bo Liu | Daoming Lyu

[1] Andrea Lockerd Thomaz,et al. Reinforcement Learning with Human Teachers: Evidence of Feedback and Guidance with Implications for Learning Performance , 2006, AAAI.

[2] John McCarthy,et al. Generality in artificial intelligence , 1987, Resonance.

[3] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[4] Fangkai Yang,et al. SDRL: Interpretable and Data-efficient Deep Reinforcement Learning Leveraging Symbolic Planning , 2018, AAAI.

[5] Peter Stone,et al. A synthesis of automated planning and reinforcement learning for efficient, robust decision-making , 2016, Artif. Intell..

[6] Andrea Lockerd Thomaz,et al. Teachable robots: Understanding human teaching behavior to build more effective robot learners , 2008, Artif. Intell..

[7] Matthew Crosby,et al. Association for the Advancement of Artificial Intelligence , 2014 .

[8] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[9] Andrea Lockerd Thomaz,et al. Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.

[10] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.

[11] Hector Muñoz-Avila,et al. Learning Methods to Generate Good Plans: Integrating HTN Learning and Reinforcement Learning , 2010, AAAI.

[12] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[13] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[14] Fangkai Yang,et al. Mobile Robot Planning Using Action Language BC with an Abstraction Hierarchy , 2015, LPNMR.

[15] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[16] Michael Gelfond,et al. Some properties of system descriptions of , 2013, J. Appl. Non Class. Logics.

[17] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[18] Fangkai Yang,et al. Planning in Action Language BC while Learning Action Costs for Mobile Robots , 2014, ICAPS.

[19] Mark D. Pendrith,et al. RL-TOPS: An Architecture for Modularity and Re-Use in Reinforcement Learning , 1998, ICML.

[20] Marc Hanheide,et al. Robot task planning and explanation in open and uncertain worlds , 2017, Artif. Intell..

[21] Joohyung Lee,et al. Action Language BC+: Preliminary Report , 2015, AAAI.

[22] Stephanie Rosenthal,et al. Human-centered planning for effective tast autonomy , 2012 .

[23] Shalabh Bhatnagar,et al. Natural actor-critic algorithms , 2009, Autom..

[24] Fangkai Yang,et al. PEORL: Integrating Symbolic Planning and Hierarchical Reinforcement Learning for Robust Decision-Making , 2018, IJCAI.

[25] Peter Stone,et al. Reinforcement learning from simultaneous human and MDP reward , 2012, AAMAS.

[26] Michael Gelfond,et al. Modular Action Language ALM , 2015, ArXiv.

[27] Peter Stone,et al. Combining manual feedback with subsequent MDP reward signals for reinforcement learning , 2010, AAMAS.

[28] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.

[29] Jake K. Aggarwal,et al. BWIBots: A platform for bridging the gap between AI and human–robot interaction research , 2017, Int. J. Robotics Res..

[30] Andrew Y. Ng,et al. Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[31] Malcolm R. K. Ryan. Using Abstract Models of Behaviours to Automatically Generate Reinforcement Learning Hierarchies , 2002, ICML.

[32] Peter Stone,et al. Interactively shaping agents via human reinforcement: the TAMER framework , 2009, K-CAP '09.

[33] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[34] Marco Pistore,et al. Handbook of Knowledge Representation Edited Automated Planning , 2022 .

[35] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[36] Malte Helmert,et al. The Fast Downward Planning System , 2006, J. Artif. Intell. Res..

[37] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[38] Vladimir Lifschitz,et al. A library of general-purpose action descriptions , 2008 .

[39] Guan Wang,et al. Interactive Learning from Policy-Dependent Human Feedback , 2017, ICML.

[40] Thomas Dean,et al. Automated planning , 1996, CSUR.

[41] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[42] Vladimir Lifschitz,et al. A Modular Action Description Language , 2006, AAAI.

[43] Amir Pnueli,et al. A Platform for Combining Deductive with Algorithmic Verification , 1996, CAV.

[44] Joshua B. Tenenbaum,et al. Human Learning in Atari , 2017, AAAI Spring Symposia.

[45] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[46] Stephanie Rosenthal,et al. Learning Accuracy and Availability of Humans Who Help Mobile Robots , 2011, AAAI.

[47] Luca Iocchi,et al. Automatic Generation and Learning of Finite-State Controllers , 2012, AIMSA.

[48] Fangkai Yang,et al. Task-Motion Planning with Reinforcement Learning for Adaptable Mobile Service Robots , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).