S2RMs: Spatially Structured Recurrent Modules

Capturing the structure of a data-generating process by means of appropriate inductive biases can help in learning models that generalize well and are robust to changes in the input distribution. While methods that harness spatial and temporal structures find broad application, recent work has demonstrated the potential of models that leverage sparse and modular structure using an ensemble of sparingly interacting modules. In this work, we take a step towards dynamic models that are capable of simultaneously exploiting both modular and spatiotemporal structures. We accomplish this by abstracting the modeled dynamical system as a collection of autonomous but sparsely interacting sub-systems. The sub-systems interact according to a topology that is learned, but also informed by the spatial structure of the underlying real-world system. This results in a class of models that are well suited for modeling the dynamics of systems that only offer local views into their state, along with corresponding spatial locations of those views. On the tasks of video prediction from cropped frames and multi-agent world modeling from partial observations in the challenging Starcraft2 domain, we find our models to be more robust to the number of available views and better capable of generalization to novel tasks without additional training, even when compared against strong baselines that perform equally well or better on the training distribution.

[1]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[2]  Kilian Q. Weinberger,et al.  CondenseNet: An Efficient DenseNet Using Learned Group Convolutions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[4]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[5]  Sergey Levine,et al.  Recurrent Independent Mechanisms , 2019, ICLR.

[6]  Joachim M. Buhmann,et al.  Disentangled State Space Representations , 2019, ArXiv.

[7]  Jason Weston,et al.  Tracking the World State with Recurrent Entity Networks , 2016, ICLR.

[8]  Sungjin Ahn,et al.  Sequential Neural Processes , 2019, NeurIPS.

[9]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[10]  Diederik P. Kingma,et al.  GPU Kernels for Block-Sparse Weights , 2017 .

[11]  Murray Shanahan,et al.  Consistent Generative Query Networks , 2018, ArXiv.

[12]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[13]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[14]  Christopher Joseph Pal,et al.  Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding , 2018, NeurIPS.

[15]  E FasshauerG Positive definite kernels: past, present and future , 2011 .

[16]  Koray Kavukcuoglu,et al.  Neural scene representation and rendering , 2018, Science.

[17]  Tom Schaul,et al.  StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[18]  Yee Whye Teh,et al.  Neural Processes , 2018, ArXiv.

[19]  Yoshua Bengio,et al.  Tackling Climate Change with Machine Learning , 2019, ACM Comput. Surv..

[20]  Shimon Whiteson,et al.  The StarCraft Multi-Agent Challenge , 2019, AAMAS.

[21]  Leslie Pack Kaelbling,et al.  Modular meta-learning , 2018, CoRL.

[22]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[23]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[24]  Bernhard Schölkopf,et al.  Learning Independent Causal Mechanisms , 2017, ICML.

[25]  Christopher Joseph Pal,et al.  A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms , 2019, ICLR.

[26]  Razvan Pascanu,et al.  Relational recurrent neural networks , 2018, NeurIPS.

[27]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[28]  Yoshua Bengio,et al.  Learning Neural Causal Models from Unknown Interventions , 2019, ArXiv.

[29]  Geoffrey E. Hinton,et al.  Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[30]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[31]  Shuai Li,et al.  Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[33]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[35]  G. Fasshauer Positive definite kernels: past, present and future , 2011 .

[36]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.