Learning to Reason with Third-Order Tensor Products

We combine Recurrent Neural Networks with Tensor Product Representations to learn combinatorial representations of sequential data. This improves symbolic interpretation and systematic generalisation. Our architecture is trained end-to-end through gradient descent on a variety of simple natural language reasoning tasks, significantly outperforming the latest state-of-the-art models in single-task and all-tasks settings. We also augment a subset of the data such that training and test data exhibit large systematic differences and show that our approach generalises better than the previous state-of-the-art.

[1]  F. L. Hitchcock The Expression of a Tensor or a Polyadic as a Sum of Products , 1927 .

[2]  Henry J. Kelley,et al.  Gradient Theory of Optimal Flight Paths , 1960 .

[3]  Geoffrey E. Hinton Using fast weights to deblur old memories , 1987 .

[4]  PAUL J. WERBOS,et al.  Generalization of backpropagation with application to a recurrent gas market model , 1988, Neural Networks.

[5]  J. Fodor,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[6]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[7]  J. Fodor,et al.  Connectionism and the problem of systematicity: Why Smolensky's solution doesn't work , 1990, Cognition.

[8]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[9]  Eduardo Sontag,et al.  Turing computability with neural nets , 1991 .

[10]  Geoffrey E. Hinton Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems , 1991 .

[11]  E. Capaldi,et al.  The organization of behavior. , 1992, Journal of applied behavior analysis.

[12]  O. Brousse Generativity and systematicity in neural network combinatorial learning , 1992 .

[13]  Jürgen Schmidhuber,et al.  Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks , 1992, Neural Computation.

[14]  Robert F. Hadley Systematicity in Connectionist Language Learning , 1994 .

[15]  Christoph von der Malsburg,et al.  The Correlation Theory of Brain Function , 1994 .

[16]  Ronald J. Williams,et al.  Gradient-based learning algorithms for recurrent networks and their computational complexity , 1995 .

[17]  Steven Phillips,et al.  Connectionism and the problem of systematicity , 1995 .

[18]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[19]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[20]  Jerome A. Feldman,et al.  Dynamic connections in neural networks , 1990, Biological Cybernetics.

[21]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[22]  P. Smolensky Symbolic functions from neural computation , 2012, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[23]  Anima Anandkumar,et al.  Tensor Decompositions for Learning Latent Variable Models (A Survey for ALT) , 2015, ALT.

[24]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[25]  Jason Weston,et al.  Weakly Supervised Memory Networks , 2015, ArXiv.

[26]  Richard Socher,et al.  Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[27]  Joshua B. Tenenbaum,et al.  Building machines that learn and think like people , 2016, Behavioral and Brain Sciences.

[28]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[29]  Jianfeng Gao,et al.  Basic Reasoning with Tensor Product Representations , 2016, ArXiv.

[30]  Jonathan Berant,et al.  Learning to generalize to new compositions in image understanding , 2016, ArXiv.

[31]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[32]  Alex Graves,et al.  Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes , 2016, NIPS.

[33]  Geoffrey E. Hinton,et al.  Using Fast Weights to Attend to the Recent Past , 2016, NIPS.

[34]  Timothy Dozat,et al.  Incorporating Nesterov Momentum into Adam , 2016 .

[35]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[36]  Richard Socher,et al.  Dynamic Memory Networks for Visual and Textual Question Answering , 2016, ICML.

[37]  Jason Weston,et al.  Tracking the World State with Recurrent Entity Networks , 2016, ICLR.

[38]  Jürgen Schmidhuber,et al.  Gated Fast Weights for On-The-Fly Neural Program Generation , 2017 .

[39]  Volker Tresp,et al.  Tensor-Train Recurrent Neural Networks for Video Classification , 2017, ICML.

[40]  Yisong Yue,et al.  Long-term Forecasting using Tensor-Train RNNs , 2017, ArXiv.

[41]  Julien Perez,et al.  Gated End-to-End Memory Networks , 2016, EACL.

[42]  Li Deng,et al.  Tensor Product Generation Networks , 2017, ArXiv.

[43]  Li Deng,et al.  Deep Learning of Grammatically-Interpretable Representations Through Question-Answering , 2017, ArXiv.

[44]  Marco Baroni,et al.  Still not systematic after all these years: On the compositional skills of sequence-to-sequence recurrent networks , 2017, ICLR 2018.

[45]  Kenneth O. Stanley,et al.  Differentiable plasticity: training plastic neural networks with backpropagation , 2018, ICML.

[46]  Li Deng,et al.  Attentive Tensor Product Learning for Language Generation and Grammar Parsing , 2018, ArXiv.

[47]  Aleksander Madry,et al.  How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NIPS 2018.

[48]  Chris Dyer,et al.  On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.