Logic and the 2-Simplicial Transformer

We introduce the $2$-simplicial Transformer, an extension of the Transformer which includes a form of higher-dimensional attention generalising the dot-product attention, and uses this attention to update entity representations with tensor products of value vectors. We show that this architecture is a useful inductive bias for logical reasoning in the context of deep reinforcement learning.

[1]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[2]  Artur S. d'Avila Garcez,et al.  Logic Tensor Networks: Deep Learning and Logical Reasoning from Data and Knowledge , 2016, NeSy@HLAI.

[3]  C. Allen,et al.  Stanford Encyclopedia of Philosophy , 2011 .

[4]  Felix Hill,et al.  Measuring abstract reasoning in neural networks , 2018, ICML.

[5]  James Clift,et al.  Cofree coalgebras and differential linear logic , 2017, Mathematical Structures in Computer Science.

[6]  Razvan Pascanu,et al.  Relational recurrent neural networks , 2018, NeurIPS.

[7]  Russell A. Epstein,et al.  The cognitive map in humans: spatial navigation and beyond , 2017, Nature Neuroscience.

[8]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[9]  Paul-André Melliès CATEGORICAL SEMANTICS OF LINEAR LOGIC , 2009 .

[10]  Peter Gärdenfors,et al.  Navigating cognition: Spatial codes for human thinking , 2018, Science.

[11]  Max Jaderberg,et al.  Population Based Training of Neural Networks , 2017, ArXiv.

[12]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[13]  Jason Yosinski,et al.  Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks , 2016, ArXiv.

[14]  Jean-Yves Girard,et al.  Linear Logic , 1987, Theor. Comput. Sci..

[15]  Claire Cardie,et al.  Modeling Compositionality with Multiplicative Recurrent Neural Networks , 2014, ICLR.

[16]  V. Rich Personal communication , 1989, Nature.

[17]  Shan Carter,et al.  Attention and Augmented Recurrent Neural Networks , 2016 .

[18]  Chong Wang,et al.  Neural Logic Machines , 2019, ICLR.

[19]  Timothy E. J. Behrens,et al.  Human Replay Spontaneously Reorganizes Experience , 2019, Cell.

[20]  C. L. Giles,et al.  Second-order recurrent neural networks for grammatical inference , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[21]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[23]  Yoshua Bengio,et al.  Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks , 2015, IEEE Transactions on Multimedia.

[24]  Timothy Edward John Behrens,et al.  Generalisation of structural knowledge in the Hippocampal-Entorhinal system , 2018, NeurIPS.

[26]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[27]  Anton Dumitriu History of logic , 1977 .

[28]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[29]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[30]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[31]  R. Lambiotte,et al.  From networks to optimal higher-order models of complex systems , 2019, Nature Physics.

[32]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[33]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[34]  Danqi Chen,et al.  Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.

[35]  J. Pollack The Induction of Dynamical Recognizers , 1996, Machine Learning.

[36]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[37]  Razvan Pascanu,et al.  Deep reinforcement learning with relational inductive biases , 2018, ICLR.

[38]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[39]  Tom Eccles,et al.  An investigation of model-free planning , 2019, ICML.

[40]  George Boole,et al.  The Mathematical Analysis of Logic: Being an Essay Towards a Calculus of Deductive Reasoning , 2007 .

[41]  Giovanni Petri,et al.  Simplex2Vec embeddings for community detection in simplicial complexes , 2019, ArXiv.

[42]  Zeb Kurth-Nelson,et al.  What Is a Cognitive Map? Organizing Knowledge for Flexible Behavior , 2018, Neuron.

[43]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[44]  Ilya Sutskever,et al.  Generating Long Sequences with Sparse Transformers , 2019, ArXiv.

[45]  Michał Walicki,et al.  A HISTORY OF LOGIC , 2011 .

[46]  Frédéric Chazal,et al.  An Introduction to Topological Data Analysis: Fundamental and Practical Aspects for Data Scientists , 2017, Frontiers in Artificial Intelligence.

[47]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[49]  Christopher D. Manning,et al.  A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.

[50]  C. Lee Giles,et al.  Higher Order Recurrent Networks and Grammatical Inference , 1989, NIPS.

[51]  Leyre Castro,et al.  Animal learning. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[52]  Garret Sobczyk,et al.  Simplicial calculus with Geometric Algebra , 1992 .

[53]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[54]  Michael I. Jordan,et al.  RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.

[55]  Timothy E. J. Behrens,et al.  Organizing conceptual knowledge in humans with a gridlike code , 2016, Science.

[56]  Lukasz Kaiser,et al.  Universal Transformers , 2018, ICLR.

[57]  David Hestenes New Foundations for Classical Mechanics , 1986 .

[58]  Yoshua Bengio,et al.  Understanding intermediate layers using linear classifier probes , 2016, ICLR.

[59]  L. Goddard Information Theory , 1962, Nature.

[60]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[61]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[62]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[63]  Aristotle,et al.  Complete Works of Aristotle, Volume 1: The Revised Oxford Translation , 1984 .

[64]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[65]  Srimat T. Chakradhar,et al.  First-order versus second-order single-layer recurrent neural networks , 1994, IEEE Trans. Neural Networks.

[66]  Geoffrey E. Hinton,et al.  Factored 3-Way Restricted Boltzmann Machines For Modeling Natural Images , 2010, AISTATS.

[67]  S. Abramsky Game Semantics , 1999 .