Matrix Shuffle-Exchange Networks for Hard 2D Tasks

Convolutional neural networks have become the main tools for processing two-dimensional data. They work well for images, yet convolutions have a limited receptive field that prevents its applications to more complex 2D tasks. We propose a new neural model, called Matrix Shuffle-Exchange network, that can efficiently exploit long-range dependencies in 2D data and has comparable speed to a convolutional neural network. It is derived from Neural Shuffle-Exchange network and has $\mathcal{O}( \log{n})$ layers and $\mathcal{O}( n^2 \log{n})$ total time and space complexity for processing a $n \times n$ data matrix. We show that the Matrix Shuffle-Exchange network is well-suited for algorithmic and logical reasoning tasks on matrices and dense graphs, exceeding convolutional and graph neural network baselines. Its distinct advantage is the capability of retaining full long-range dependency modelling when generalizing to larger instances - much larger than could be processed with models equipped with a dense attention mechanism.

[1]  Li Yang,et al.  Big Bird: Transformers for Longer Sequences , 2020, NeurIPS.

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Rishabh Singh,et al.  Towards Modular Algorithm Induction , 2019, ArXiv.

[4]  François Le Gall,et al.  Improved Quantum Algorithm for Triangle Finding via Combinatorial Arguments , 2014, 2014 IEEE 55th Annual Symposium on Foundations of Computer Science.

[5]  Zhuwen Li,et al.  Combinatorial Optimization with Graph Convolutional Networks and Guided Tree Search , 2018, NeurIPS.

[6]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[7]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[8]  François Le Gall,et al.  Powers of tensors and fast matrix multiplication , 2014, ISSAC.

[9]  Ilya Sutskever,et al.  Generating Long Sequences with Sparse Transformers , 2019, ArXiv.

[10]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[11]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[12]  Agris Sostaks,et al.  Neural Shuffle-Exchange Networks − Sequence Processing in O( n log n ) Time , 2019 .

[13]  Meng Wang,et al.  Graphonomy: Universal Human Parsing via Graph Transfer Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Kevin Gimpel,et al.  Gaussian Error Linear Units (GELUs) , 2016 .

[15]  Arman Cohan,et al.  Longformer: The Long-Document Transformer , 2020, ArXiv.

[16]  Manjish Pal,et al.  Fast Approximate Matrix Multiplication by Solving Linear Systems , 2014, Electron. Colloquium Comput. Complex..

[17]  Nathan S. Netanyahu,et al.  DeepChess: End-to-End Deep Neural Network for Automatic Learning in Chess , 2016, ICANN.

[18]  Gary McGuire,et al.  There Is No 16-Clue Sudoku: Solving the Sudoku Minimum Number of Clues Problem via Hitting Set Enumeration , 2012, Exp. Math..

[19]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Donald F. Towsley,et al.  Diffusion-Convolutional Neural Networks , 2015, NIPS.

[21]  Rico Sennrich,et al.  Root Mean Square Layer Normalization , 2019, NeurIPS.

[22]  Roland Memisevic,et al.  How far can we go without convolution: Improving fully-connected networks , 2015, ArXiv.

[23]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[24]  Alessio Micheli,et al.  Neural Network for Graphs: A Contextual Constructive Approach , 2009, IEEE Transactions on Neural Networks.

[25]  Ran Raz,et al.  On the complexity of matrix product , 2002, STOC '02.

[26]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[27]  Priya L. Donti,et al.  SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver , 2019, ICML.

[28]  Han Fang,et al.  Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.

[29]  Noga Alon,et al.  Finding and counting given length cycles , 1997, Algorithmica.

[30]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[31]  Dawn Xiaodong Song,et al.  Improving Neural Program Synthesis with Inferred Execution Traces , 2018, NeurIPS.

[32]  Ole Winther,et al.  Recurrent Relational Networks , 2017, NeurIPS.

[33]  Dahua Lin,et al.  Learning to Cluster Faces via Confidence and Connectivity Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Lukasz Kaiser,et al.  Reformer: The Efficient Transformer , 2020, ICLR.

[35]  Lukasz Kaiser,et al.  Neural GPUs Learn Algorithms , 2015, ICLR.

[36]  Marc Brockschmidt,et al.  Learning to Represent Programs with Graphs , 2017, ICLR.

[37]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[38]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[39]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[40]  Liyuan Liu,et al.  On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.

[41]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[42]  Karlis Freivalds,et al.  Improving the Neural GPU Architecture for Algorithm Learning , 2017, ArXiv.