Recurrent Independent Mechanisms

Learning modular structures which reflect the dynamics of the environment can lead to better generalization and robustness to changes which only affect a few of the underlying causes. We propose Recurrent Independent Mechanisms (RIMs), a new recurrent architecture in which multiple groups of recurrent cells operate with nearly independent transition dynamics, communicate only sparingly through the bottleneck of attention, and are only updated at time steps where they are most relevant. We show that this leads to specialization amongst the RIMs, which in turn allows for dramatically improved generalization on tasks where some factors of variation differ systematically between training and evaluation.

[1]  Chrisantha Fernando,et al.  PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[2]  Razvan Pascanu,et al.  Visual Interaction Networks: Learning a Physics Simulator from Video , 2017, NIPS.

[3]  E. Holst,et al.  Das Reafferenzprinzip , 2004, Naturwissenschaften.

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Yoshua Bengio,et al.  Hierarchical Multiscale Recurrent Neural Networks , 2016, ICLR.

[6]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[7]  David Barber,et al.  Modular Networks: Learning to Decompose Neural Computation , 2018, NeurIPS.

[8]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  Razvan Pascanu,et al.  Discovering objects and their relations from entangled scene representations , 2017, ICLR.

[11]  Matthew Riemer,et al.  Routing Networks: Adaptive Selection of Non-linear Functions for Multi-Task Learning , 2017, ICLR.

[12]  HERBERT A. SIMON,et al.  The Architecture of Complexity , 1991 .

[13]  Jürgen Schmidhuber,et al.  A Clockwork RNN , 2014, ICML.

[14]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[15]  Jürgen Schmidhuber,et al.  World Models , 2018, ArXiv.

[16]  Sergey Levine,et al.  InfoBot: Transfer and Exploration via the Information Bottleneck , 2019, ICLR.

[17]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[18]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[19]  H. Francis Song,et al.  Relational Forward Models for Multi-Agent Learning , 2018, ICLR.

[20]  Jürgen Schmidhuber,et al.  Relational Neural Expectation Maximization: Unsupervised Discovery of Objects and their Interactions , 2018, ICLR.

[21]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.

[22]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[23]  Yoshua Bengio,et al.  The Consciousness Prior , 2017, ArXiv.

[24]  Christopher Joseph Pal,et al.  A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms , 2019, ICLR.

[25]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[26]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[27]  Dan Klein,et al.  Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Sergey Levine,et al.  Learning Powerful Policies by Using Consistent Dynamics Model , 2019, ArXiv.

[29]  Jason Weston,et al.  Tracking the World State with Recurrent Entity Networks , 2016, ICLR.

[30]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[31]  Shuai Li,et al.  Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  A. Dickinson Actions and habits: the development of behavioural autonomy , 1985 .

[33]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[34]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[35]  Patrick Gallinari,et al.  A Framework for the Cooperation of Learning Algorithms , 1990, NIPS.

[36]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[37]  Christopher Joseph Pal,et al.  Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding , 2018, NeurIPS.

[38]  R. Zemel,et al.  Neural Relational Inference for Interacting Systems , 2018, ICML.

[39]  Nando de Freitas,et al.  Neural Programmer-Interpreters , 2015, ICLR.

[40]  Rob Fergus,et al.  Stochastic Video Generation with a Learned Prior , 2018, ICML.

[41]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[42]  Jürgen Schmidhuber,et al.  One Big Net For Everything , 2018, ArXiv.

[43]  Yoshua Bengio,et al.  BabyAI: First Steps Towards Grounded Language Learning With a Human In the Loop , 2018, ArXiv.

[44]  Yee Whye Teh,et al.  Distral: Robust multitask reinforcement learning , 2017, NIPS.

[45]  Pierre Vandergheynst,et al.  Geometric Deep Learning: Going beyond Euclidean data , 2016, IEEE Signal Process. Mag..

[46]  J. Kalaska,et al.  Neural mechanisms for interacting with a world full of action choices. , 2010, Annual review of neuroscience.

[47]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[48]  Guillaume Desjardins,et al.  Understanding disentangling in $\beta$-VAE , 2018, 1804.03599.

[49]  三嶋 博之 The theory of affordances , 2008 .

[50]  Shih-Chii Liu,et al.  Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences , 2016, NIPS.

[51]  Henrik Gollee,et al.  Modular Neural Networks and Self-Decomposition , 1997 .

[52]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[53]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[54]  Yoshua Bengio,et al.  Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations , 2016, ICLR.

[55]  Ignacio Cases,et al.  Routing Networks and the Challenges of Modular and Compositional Computation , 2019, ArXiv.

[56]  M. Botvinick,et al.  Mental labour , 2018, Nature Human Behaviour.

[57]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[58]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[59]  Bernhard Schölkopf,et al.  Learning Independent Causal Mechanisms , 2017, ICML.

[60]  Geoffrey E. Hinton,et al.  Matrix capsules with EM routing , 2018, ICLR.

[61]  Sergey Levine,et al.  Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives , 2019, ICLR.

[62]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[63]  Geoffrey E. Hinton,et al.  Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[64]  Roger B. Grosse,et al.  Isolating Sources of Disentanglement in Variational Autoencoders , 2018, NeurIPS.

[65]  Tomas Mikolov,et al.  Variable Computation in Recurrent Neural Networks , 2016, ICLR.

[66]  R. Desimone,et al.  Neural mechanisms of selective visual attention. , 1995, Annual review of neuroscience.

[67]  Yoshua Bengio,et al.  Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.

[68]  Razvan Pascanu,et al.  Relational recurrent neural networks , 2018, NeurIPS.

[69]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[70]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[71]  M. Botvinick,et al.  Motivation and cognitive control: from behavior to neural mechanism. , 2015, Annual review of psychology.