Reversible Fixup Networks for Memory-Efficient Training

Deep residual networks (ResNets) have been used successfully for many computer vision tasks, but are difficult to scale to high resolution images or 3D volumetric datasets. Activation memory requirements, which scale with network depth and mini-batch size, quickly become prohibitive. Recently, invertible neural networks (INNs) have been successfully applied to classification problems to reduce memory requirements during neural network training, enabling constant memory complexity in depth. This is accomplished by removing the need to store the input activations computed in the forward pass and instead reconstruct them on-the-fly during the backward pass. But existing approaches require additive coupling layers, which perform channel-wise splitting and add additional parameters to the standard ResNet. In addition, they require a large enough mini-batch size to perform effec-tive Batch Normalization. We propose RevUp, an invertible ResNet architecture with Fixup initialization that is memory efficient, achieves high generalization, removes mini-batch dependence and does not require modifications via coupling layers as seen in previous methods. We show that RevUp achieves competitive test accuracy against comparable baselines on CIFAR-10 and ILSVRC (ImageNet).

[1]  Bernhard Pfahringer,et al.  Regularisation of neural networks by enforcing Lipschitz continuity , 2018, Machine Learning.

[2]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Tengyu Ma,et al.  Fixup Initialization: Residual Learning Without Normalization , 2019, ICLR.

[4]  David Duvenaud,et al.  Invertible Residual Networks , 2018, ICML.

[5]  Ullrich Köthe,et al.  Analyzing Inverse Problems with Invertible Neural Networks , 2018, ICLR.

[6]  Roger B. Grosse,et al.  Reversible Recurrent Neural Networks , 2018, NeurIPS.

[7]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[8]  Arnold W. M. Smeulders,et al.  i-RevNet: Deep Invertible Networks , 2018, ICLR.

[9]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[10]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[11]  Raquel Urtasun,et al.  The Reversible Residual Network: Backpropagation Without Storing Activations , 2017, NIPS.

[12]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[13]  Brian McWilliams,et al.  The Shattered Gradients Problem: If resnets are the answer, then what is the question? , 2017, ICML.

[14]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[16]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[17]  Tianqi Chen,et al.  Training Deep Nets with Sublinear Memory Cost , 2016, ArXiv.

[18]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Jiri Matas,et al.  All you need is a good init , 2015, ICLR.

[21]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[22]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[23]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[24]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[25]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[26]  Ilya Sutskever,et al.  Training Deep and Recurrent Networks with Hessian-Free Optimization , 2012, Neural Networks: Tricks of the Trade.