论文信息 - Neural Function Modules with Sparse Arguments: A Dynamic Approach to Integrating Information across Layers

Neural Function Modules with Sparse Arguments: A Dynamic Approach to Integrating Information across Layers

Feed-forward neural networks consist of a sequence of layers, in which each layer performs some processing on the information from the previous layer. A downside to this approach is that each layer (or module, as multiple modules can operate in parallel) is tasked with processing the entire hidden state, rather than a particular part of the state which is most relevant for that module. Methods which only operate on a small number of input variables are an essential part of most programming languages, and they allow for improved modularity and code re-usability. Our proposed method, Neural Function Modules (NFM), aims to introduce the same structural capability into deep learning. Most of the work in the context of feed-forward networks combining top-down and bottom-up feedback is limited to classification problems. The key contribution of our work is to combine attention, sparsity, top-down and bottom-up feedback, in a flexible algorithm which, as we show, improves the results in standard classification, out-of-domain generalization, generative modeling, and learning representations in the context of reinforcement learning.

[1] Yang Li,et al. GaterNet: Dynamic Filter Selection in Convolutional Neural Network via a Dedicated Global Gating Network , 2018, ArXiv.

[2] Yoshua Bengio,et al. Deep Learning of Representations: Looking Forward , 2013, SLSP.

[3] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[4] Eric T. Shea-Brown,et al. Dynamic representation of partially occluded objects in primate prefrontal and visual cortex , 2017, eLife.

[5] Christopher Joseph Pal,et al. A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms , 2019, ICLR.

[6] Chrisantha Fernando,et al. PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[7] In-So Kweon,et al. CBAM: Convolutional Block Attention Module , 2018, ECCV.

[8] Nikolaus Kriegeskorte,et al. Recurrent Convolutional Neural Networks: A Better Model of Biological Object Recognition , 2017, bioRxiv.

[9] Yoshua Bengio,et al. Residual Connections Encourage Iterative Inference , 2017, ICLR.

[10] Bernhard Schölkopf,et al. Recurrent Independent Mechanisms , 2021, ICLR.

[11] Yisong Yue,et al. Iterative Amortized Inference , 2018, ICML.

[12] Amos J. Storkey,et al. Dilated DenseNets for Relational Reasoning , 2018, ArXiv.

[13] Klaus Greff,et al. Multi-Object Representation Learning with Iterative Variational Inference , 2019, ICML.

[14] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Alex Graves,et al. Strategic Attentive Writer for Learning Macro-Actions , 2016, NIPS.

[16] Gang Sun,et al. Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks , 2018, NeurIPS.

[17] Xiaohua Zhai,et al. Self-Supervised GANs via Auxiliary Rotation Loss , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Han Zhang,et al. Self-Attention Generative Adversarial Networks , 2018, ICML.

[19] Graham W. Taylor,et al. Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[20] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.

[21] Geoffrey E. Hinton,et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , 2017, ICLR.

[22] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[23] Christopher Joseph Pal,et al. Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding , 2018, NeurIPS.

[24] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[25] Jin-Hyuk Hong,et al. Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information , 2018, AAAI.

[26] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.

[27] Serge J. Belongie,et al. Convolutional Networks with Adaptive Inference Graphs , 2017, International Journal of Computer Vision.

[28] Larry S. Davis,et al. BlockDrop: Dynamic Inference Paths in Residual Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[31] Xin Wang,et al. SkipNet: Learning Dynamic Routing in Convolutional Networks , 2017, ECCV.

[32] Yoshua Bengio,et al. Deep Learning of Representations , 2013, Handbook on Neural Information Processing.

[33] Aaron C. Courville,et al. Systematic Generalization: What Is Required and Can It Be Learned? , 2018, ICLR.

[34] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[35] Pietro Perona,et al. Deciding How to Decide: Dynamic Routing in Artificial Neural Networks , 2017, ICML.

[36] Matthew Riemer,et al. Routing Networks: Adaptive Selection of Non-linear Functions for Multi-Task Learning , 2017, ICLR.

[37] Ioannis Mitliagkas,et al. Manifold Mixup: Better Representations by Interpolating Hidden States , 2018, ICML.

[38] Razvan Pascanu,et al. A simple neural network module for relational reasoning , 2017, NIPS.

[39] Christopher Town,et al. Mimicry: Towards the Reproducibility of GAN Research , 2020, ArXiv.

[40] Dan Klein,et al. Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.

[42] David Duvenaud,et al. Neural Ordinary Differential Equations , 2018, NeurIPS.

[43] Yoshua Bengio,et al. GibbsNet: Iterative Adversarial Inference for Deep Graphical Models , 2017, NIPS.

[44] Lin Sun,et al. Feedback Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Geoffrey E. Hinton,et al. Deep Boltzmann Machines , 2009, AISTATS.

[46] Yan Wu,et al. LOGAN: Latent Optimisation for Generative Adversarial Networks , 2019, ArXiv.