Systematic Generalisation through Task Temporal Logic and Deep Reinforcement Learning

This paper presents a neuro-symbolic agent that combines deep reinforcement learning (DRL) with temporal logic (TL), and achieves systematic out-of-distribution generalisation in tasks that involve following a formally specified instruction. Specifically, the agent learns general notions of negation and disjunction, and successfully applies them to previously unseen objects without further training. To this end, we also introduce Task Temporal Logic (TTL), a learning-oriented formal language, whose atoms are designed to help the training of a DRL agent targeting systematic generalisation. To validate this combination of logic-based and neural-network techniques, we provide experimental evidence for the kind of neural-network architecture that most enhances the generalisation performance of the agent. Our findings suggest that the right architecture can significatively improve the ability of the agent to generalise in systematic ways, even with abstract operators, such as negation, which previous research have struggled with.

[1]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[2]  J. Fodor,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[3]  P. Smolensky On the proper treatment of connectionism , 1988, Behavioral and Brain Sciences.

[4]  Craig Boutilier,et al.  Rewarding Behaviors , 1996, AAAI/IAAI, Vol. 2.

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  G. Marcus Rethinking Eliminative Connectionism , 1998, Cognitive Psychology.

[7]  Mark Steedman Connectionist Sentence Processing in Perspective , 1999 .

[8]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[9]  Orna Kupferman,et al.  Model Checking of Safety Properties , 1999, Formal Methods Syst. Des..

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Valentin Goranko,et al.  Logic in Computer Science: Modelling and Reasoning About Systems , 2007, J. Log. Lang. Inf..

[12]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[13]  Christel Baier,et al.  Principles of model checking , 2008 .

[14]  James L. McClelland,et al.  Letting structure emerge: connectionist and dynamical systems approaches to cognition , 2010, Trends in Cognitive Sciences.

[15]  Mark Steedman,et al.  Taking Scope - The Natural Semantics of Quantifiers , 2011 .

[16]  Giuseppe De Giacomo,et al.  Linear Temporal Logic and Linear Dynamic Logic on Finite Traces , 2013, IJCAI.

[17]  Phil Blunsom,et al.  “Not not bad” is not “bad”: A distributional account of negation , 2013, CVSM@ACL.

[18]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[19]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[20]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[21]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[22]  Honglak Lee,et al.  Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning , 2017, ICML.

[23]  Dan Klein,et al.  Modular Multitask Reinforcement Learning with Policy Sketches , 2016, ICML.

[24]  Calin Belta,et al.  Automata-Guided Hierarchical Reinforcement Learning for Skill Composition , 2017 .

[25]  Ufuk Topcu,et al.  Environment-Independent Task Specifications via GLTL , 2017, ArXiv.

[26]  Sheila A. McIlraith,et al.  Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning , 2018, ICML.

[27]  Sheila A. McIlraith,et al.  Teaching Multiple Tasks to an RL Agent using LTL , 2018, AAMAS.

[28]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[29]  Marco Baroni,et al.  Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks , 2017, ICML.

[30]  Ronen I. Brafman,et al.  LTLf/LDLf Non-Markovian Rewards , 2018, AAAI.

[31]  Wei Xu,et al.  Interactive Grounded Language Acquisition and Generalization in a 2D World , 2018, ICLR.

[32]  Ufuk Topcu,et al.  Safe Reinforcement Learning via Shielding , 2017, AAAI.

[33]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[34]  J. Hare Dealing with Sparse Rewards in Reinforcement Learning , 2019, ArXiv.

[35]  Giuseppe De Giacomo,et al.  Foundations for Restraining Bolts: Reinforcement Learning with LTLf/LDLf Restraining Specifications , 2018, ICAPS.

[36]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[37]  Brenden M. Lake,et al.  Compositional generalization through meta sequence-to-sequence learning , 2019, NeurIPS.

[38]  Daniel Kroening,et al.  Modular Deep Reinforcement Learning with Temporal Logic Specifications , 2019, ArXiv.

[39]  Aaron C. Courville,et al.  Systematic Generalization: What Is Required and Can It Be Learned? , 2018, ICLR.

[40]  Shimon Whiteson,et al.  A Survey of Reinforcement Learning Informed by Natural Language , 2019, IJCAI.

[41]  Rajeev Alur,et al.  A Composable Specification Language for Reinforcement Learning Tasks , 2020, NeurIPS.

[42]  Murray Shanahan,et al.  Reconciling deep learning with symbolic artificial intelligence: representing objects and relations , 2019, Current Opinion in Behavioral Sciences.

[43]  Sheila A. McIlraith,et al.  Learning Reward Machines for Partially Observable Reinforcement Learning , 2019, NeurIPS.

[44]  Alberto Camacho,et al.  LTL and Beyond: Formal Languages for Reward Function Specification in Reinforcement Learning , 2019, IJCAI.

[45]  Chuang Gan,et al.  The Neuro-Symbolic Concept Learner: Interpreting Scenes Words and Sentences from Natural Supervision , 2019, ICLR.

[46]  Andrei Barbu,et al.  Encoding formulas as deep networks: Reinforcement learning for zero-shot execution of LTL formulas , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[47]  Victor Talpaert,et al.  Deep Reinforcement Learning for Autonomous Driving: A Survey , 2020, IEEE Transactions on Intelligent Transportation Systems.

[48]  Chelsea Finn,et al.  Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[49]  Demis Hassabis,et al.  Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[50]  Daniel Guo,et al.  Agent57: Outperforming the Atari Human Benchmark , 2020, ICML.

[51]  James L. McClelland,et al.  Environmental drivers of systematicity and generalization in a situated agent , 2019, ICLR.

[52]  Francesco Belardinelli,et al.  Extended Markov Games to Learn Multiple Tasks in Multi-Agent Reinforcement Learning , 2020, ECAI.

[53]  Stephen Clark,et al.  Grounded Language Learning Fast and Slow , 2020, ICLR.

[54]  Nils Jansen,et al.  AlwaysSafe: Reinforcement Learning without Safety Constraint Violations during Training , 2021, AAMAS.

[55]  Tom Melham,et al.  DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning , 2021, AAAI.

[56]  Michael Wooldridge,et al.  Multi-Agent Reinforcement Learning with Temporal Logic Specifications , 2021, AAMAS.