论文信息 - Teaching Multiple Tasks to an RL Agent using LTL

Teaching Multiple Tasks to an RL Agent using LTL

This paper examines the problem of how to teach multiple tasks to a Reinforcement Learning (RL) agent. To this end, we use Linear Temporal Logic (LTL) as a language for specifying multiple tasks in a manner that supports the composition of learned skills. We also propose a novel algorithm that exploits LTL progression and off-policy RL to speed up learning without compromising convergence guarantees, and show that our method outperforms the state-of-the-art approach on randomly generated Minecraft-like grids.

Sheila A. McIlraith | Richard Anthony Valenzano | Toryn Q. Klassen | Rodrigo Toro Icarte | R. Valenzano

[1] Gregory Kuhlmann and Peter Stone and Raymond J. Mooney and Shavlik. Guiding a Reinforcement Learner with Natural Language Advice: Initial Results in RoboCup Soccer , 2004, AAAI 2004.

[2] Craig Boutilier,et al. Rewarding Behaviors , 1996, AAAI/IAAI, Vol. 2.

[3] Amir Pnueli,et al. The temporal logic of programs , 1977, 18th Annual Symposium on Foundations of Computer Science (sfcs 1977).

[4] Thomas G. Dietterich. Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition , 1999, J. Artif. Intell. Res..

[5] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[6] Satinder P. Singh,et al. Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.

[7] Alessandro Lazaric,et al. Transfer from Multiple MDPs , 2011, NIPS.

[8] John K. Slaney,et al. Decision-Theoretic Planning with non-Markovian Rewards , 2011, J. Artif. Intell. Res..

[9] Daniel Kroening,et al. Logically-Constrained Reinforcement Learning , 2018, 1801.08099.

[10] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[11] P. Stone,et al. TAMER: Training an Agent Manually via Evaluative Reinforcement , 2008, 2008 7th IEEE International Conference on Development and Learning.

[12] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[13] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[14] Alan Fern,et al. Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[15] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[16] Karen M. Feigh,et al. Learning From Explanations Using Sentiment and Advice in RL , 2017, IEEE Transactions on Cognitive and Developmental Systems.

[17] Ruslan Salakhutdinov,et al. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[18] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[19] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20] Ufuk Topcu,et al. Environment-Independent Task Specifications via GLTL , 2017, ArXiv.

[21] Dan Klein,et al. Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[22] Fahiem Bacchus,et al. Using temporal logics to express search control knowledge for planning , 2000, Artif. Intell..

[23] Nick Hawes,et al. Optimal Policy Generation for Partially Satisfiable Co-Safe LTL Specifications , 2015, IJCAI.

[24] Stuart J. Russell,et al. Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[25] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.

[26] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[27] Ronen I. Brafman,et al. LTLf/LDLf Non-Markovian Rewards , 2018, AAAI.

[28] Guan Wang,et al. Interactive Learning from Policy-Dependent Human Feedback , 2017, ICML.

[29] Daniel Kroening,et al. Logically-Correct Reinforcement Learning , 2018, ArXiv.

[30] Matthew E. Taylor,et al. Integrating Human Demonstration and Reinforcement Learning : Initial Results in Human-Agent Transfer , 2010 .

[31] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[32] Marcus Hutter,et al. Multi-task reinforcement learning : shaping and feature selection , 2011 .

[33] David L. Roberts,et al. Training an Agent to Ground Commands with Reward and Punishment , 2014, AAAI 2014.

[34] Yee Whye Teh,et al. Distral: Robust multitask reinforcement learning , 2017, NIPS.

[35] Calin Belta,et al. Reinforcement learning with temporal logic rewards , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[36] Nick Hawes,et al. Optimal and dynamic planning for Markov decision processes with co-safe LTL specifications , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[37] Tanaka Fumihide,et al. Multitask Reinforcement Learning on the Distribution of MDPs , 2003 .

[38] Tom Schaul,et al. Universal Value Function Approximators , 2015, ICML.

[39] Alessandro Lazaric,et al. Bayesian Multi-Task Reinforcement Learning , 2010, ICML.

[40] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[41] Sheila A. McIlraith,et al. Advice-Based Exploration in Model-Based Reinforcement Learning , 2018, Canadian Conference on AI.

[42] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[43] Orna Kupferman,et al. Model Checking of Safety Properties , 1999, CAV.

[44] Jude W. Shavlik,et al. Creating Advice-Taking Reinforcement Learners , 1998, Machine Learning.

[45] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[46] Peter Stone,et al. Cross-domain transfer for reinforcement learning , 2007, ICML '07.

[47] Pieter Abbeel,et al. Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[48] Ufuk Topcu,et al. Probably Approximately Correct Learning in Stochastic Games with Temporal Logic Specifications , 2016, IJCAI.

[49] Andrea Lockerd Thomaz,et al. Reinforcement Learning with Human Teachers: Understanding How People Want to Teach Robots , 2006, ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication.

[50] Lihong Li,et al. Sample Complexity of Multi-task Reinforcement Learning , 2013, UAI.

[51] Scott Sanner,et al. Non-Markovian Rewards Expressed in LTL: Guiding Search Via Reward Shaping , 2021, SOCS.

[52] Matthias Scheutz,et al. What to do and how to do it: Translating natural language directives into temporal and dynamic logic representation for goal management and action execution , 2009, 2009 IEEE International Conference on Robotics and Automation.

[53] Ufuk Topcu,et al. Learning from Demonstrations with High-Level Side Information , 2017, IJCAI.