论文信息 - Symmetric block-low-rank layers for fully reversible multilevel neural networks

Symmetric block-low-rank layers for fully reversible multilevel neural networks

Factors that limit the size of the input and output of a neural network include memory requirements for the network states/activations to compute gradients, as well as memory for the convolutional kernels or other weights. The memory restriction is especially limiting for applications where we want to learn how to map volumetric data to the desired output, such as video-to-video. Recently developed fully reversible neural networks enable gradient computations using storage of the network states for a couple of layers only. While this saves a tremendous amount of memory, it is the convolutional kernels that take up most memory if fully reversible networks contain multiple invertible pooling/coarsening layers. Invertible coarsening operators such as the orthogonal wavelet transform cause the number of channels to grow explosively. We address this issue by combining fully reversible networks with layers that contain the convolutional kernels in a compressed form directly. Specifically, we introduce a layer that has a symmetric block-low-rank structure. In spirit, this layer is similar to bottleneck and squeeze-and-expand structures. We contribute symmetry by construction, and a combination of notation and flattening of tensors allows us to interpret these network structures in linear algebraic fashion as a block-low-rank matrix in factorized form and observe various properties. A video segmentation example shows that we can train a network to segment the entire video in one go, which would not be possible, in terms of memory requirements, using non-reversible networks and previously proposed reversible networks.

[1] Eldad Haber,et al. Deep Neural Networks Motivated by Partial Differential Equations , 2018, Journal of Mathematical Imaging and Vision.

[2] Samy Bengio,et al. Density estimation using Real NVP , 2016, ICLR.

[3] Eldad Haber,et al. Multi-resolution neural networks for tracking seismic horizons from few training images , 2018, Interpretation.

[4] Raquel Urtasun,et al. The Reversible Residual Network: Backpropagation Without Storing Activations , 2017, NIPS.

[5] Joan Bruna,et al. Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[6] Arnold W. M. Smeulders,et al. i-RevNet: Deep Invertible Networks , 2018, ICLR.

[7] Eran Treister,et al. Low-Cost Parameterizations of Deep Convolution Neural Networks , 2018, ArXiv.

[8] Jonas Teuwen,et al. MemCNN: a Framework for Developing Memory Efficient Deep Invertible Networks , 2018, ICLR.

[9] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[10] Thomas Brox,et al. 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation , 2016, MICCAI.

[11] Yanjie Zhou,et al. A Crank–Nicolson collocation spectral method for the two-dimensional telegraph equations , 2018, Journal of inequalities and applications.

[12] Frederic Truchetet,et al. Wavelets in industrial applications: a review , 2004, SPIE Optics East.

[13] Luc Van Gool,et al. A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Eldad Haber,et al. Reversible Architectures for Arbitrarily Deep Residual Neural Networks , 2017, AAAI.

[16] Nathan Srebro,et al. Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[17] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[19] Chao Wang,et al. CirCNN: Accelerating and Compressing Deep Neural Networks Using Block-Circulant Weight Matrices , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[20] Eldad Haber,et al. Fully hyperbolic convolutional neural networks , 2019, Research in the Mathematical Sciences.

[21] Eran Treister,et al. LeanResNet: A Low-cost yet Effective Convolutional Residual Networks , 2019, ICML 2019.