论文信息 - Matrix Shuffle-Exchange Networks for Hard 2D Tasks

Matrix Shuffle-Exchange Networks for Hard 2D Tasks

Convolutional neural networks have become the main tools for processing two-dimensional data. They work well for images, yet convolutions have a limited receptive field that prevents its applications to more complex 2D tasks. We propose a new neural model, called Matrix Shuffle-Exchange network, that can efficiently exploit long-range dependencies in 2D data and has comparable speed to a convolutional neural network. It is derived from Neural Shuffle-Exchange network and has $\mathcal{O}( \log{n})$ layers and $\mathcal{O}( n^2 \log{n})$ total time and space complexity for processing a $n \times n$ data matrix. We show that the Matrix Shuffle-Exchange network is well-suited for algorithmic and logical reasoning tasks on matrices and dense graphs, exceeding convolutional and graph neural network baselines. Its distinct advantage is the capability of retaining full long-range dependency modelling when generalizing to larger instances - much larger than could be processed with models equipped with a dense attention mechanism.

Karlis Freivalds | Emils Ozolicnvs | Agris vSostaks

[1] Li Yang,et al. Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.

[2] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Rishabh Singh,et al. Towards Modular Algorithm Induction , 2019, ArXiv.

[4] François Le Gall,et al. Improved Quantum Algorithm for Triangle Finding via Combinatorial Arguments , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[5] Zhuwen Li,et al. Combinatorial Optimization with Graph Convolutional Networks and Guided Tree Search , 2018, NeurIPS.

[6] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[7] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[8] François Le Gall,et al. Powers of tensors and fast matrix multiplication , 2014, ISSAC.

[9] Ilya Sutskever,et al. Generating Long Sequences with Sparse Transformers , 2019, ArXiv.

[10] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[11] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[12] Agris Sostaks,et al. Neural Shufﬂe-Exchange Networks − Sequence Processing in O( n log n ) Time , 2019 .

[13] Meng Wang,et al. Graphonomy: Universal Human Parsing via Graph Transfer Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016 .

[15] Arman Cohan,et al. Longformer: The Long-Document Transformer , 2020, ArXiv.

[16] Manjish Pal,et al. Fast Approximate Matrix Multiplication by Solving Linear Systems , 2014, Electron. Colloquium Comput. Complex..

[17] Nathan S. Netanyahu,et al. DeepChess: End-to-End Deep Neural Network for Automatic Learning in Chess , 2016, ICANN.

[18] Gary McGuire,et al. There Is No 16-Clue Sudoku: Solving the Sudoku Minimum Number of Clues Problem via Hitting Set Enumeration , 2012, Exp. Math..

[19] Philip S. Yu,et al. A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[20] Donald F. Towsley,et al. Diffusion-Convolutional Neural Networks , 2015, NIPS.

[21] Rico Sennrich,et al. Root Mean Square Layer Normalization , 2019, NeurIPS.

[22] Roland Memisevic,et al. How far can we go without convolution: Improving fully-connected networks , 2015, ArXiv.

[23] Sergio Gomez Colmenarejo,et al. Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[24] Alessio Micheli,et al. Neural Network for Graphs: A Contextual Constructive Approach , 2009, IEEE Transactions on Neural Networks.

[25] Ran Raz,et al. On the complexity of matrix product , 2002, STOC '02.

[26] Jure Leskovec,et al. How Powerful are Graph Neural Networks? , 2018, ICLR.

[27] Priya L. Donti,et al. SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver , 2019, ICML.

[28] Han Fang,et al. Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.

[29] Noga Alon,et al. Finding and counting given length cycles , 1997, Algorithmica.

[30] Demis Hassabis,et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[31] Dawn Xiaodong Song,et al. Improving Neural Program Synthesis with Inferred Execution Traces , 2018, NeurIPS.

[32] Ole Winther,et al. Recurrent Relational Networks , 2017, NeurIPS.

[33] Dahua Lin,et al. Learning to Cluster Faces via Confidence and Connectivity Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Lukasz Kaiser,et al. Reformer: The Efficient Transformer , 2020, ICLR.

[35] Lukasz Kaiser,et al. Neural GPUs Learn Algorithms , 2015, ICLR.

[36] Marc Brockschmidt,et al. Learning to Represent Programs with Graphs , 2017, ICLR.

[37] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[38] Demis Hassabis,et al. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[39] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[40] Liyuan Liu,et al. On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.

[41] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[42] Karlis Freivalds,et al. Improving the Neural GPU Architecture for Algorithm Learning , 2017, ArXiv.