Rigging the Lottery: Making All Tickets Winners

Many applications require sparse neural networks due to space or inference time restrictions. There is a large body of work on training dense networks to yield sparse networks for inference, but this limits the size of the largest trainable sparse model to that of the largest trainable dense model. In this paper we introduce a method to train sparse neural networks with a fixed parameter count and a fixed computational cost throughout training, without sacrificing accuracy relative to existing dense-to-sparse training methods. Our method updates the topology of the sparse network during training by using parameter magnitudes and infrequent gradient calculations. We show that this approach requires fewer floating-point operations (FLOPs) to achieve a given level of accuracy compared to prior techniques. We demonstrate state-of-the-art sparse training results on a variety of networks and datasets, including ResNet-50, MobileNets on Imagenet-2012, and RNNs on WikiText-103. Finally, we provide some insights into why allowing the topology to change during the optimization can overcome local minima encountered when the topology remains static. Code used in our work can be found in this http URL.

[1]  Varun Sundar,et al.  [Reproducibility Report] Rigging the Lottery: Making All Tickets Winners , 2021, ArXiv.

[2]  S. Kakade,et al.  Soft Threshold Weight Reparameterization for Learnable Sparsity , 2020, ICML.

[3]  Erich Elsen,et al.  Fast Sparse ConvNets , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Luke Zettlemoyer,et al.  Sparse Networks from Scratch: Faster Training without Losing Performance , 2019, ArXiv.

[5]  Ali Farhadi,et al.  Discovering Neural Wirings , 2019, NeurIPS.

[6]  Erich Elsen,et al.  The Difficulty of Training Sparse Neural Networks , 2019, ArXiv.

[7]  Jason Yosinski,et al.  Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask , 2019, NeurIPS.

[8]  Gintare Karolina Dziugaite,et al.  The Lottery Ticket Hypothesis at Scale , 2019, ArXiv.

[9]  Erich Elsen,et al.  The State of Sparsity in Deep Neural Networks , 2019, ArXiv.

[10]  P. Sadayappan,et al.  Adaptive sparse tiling for sparse matrix multiplication , 2019, PPoPP.

[11]  Xin Wang,et al.  Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization , 2019, ICML.

[12]  Steve B. Furber,et al.  Memory-Efficient Deep Learning on a SpiNNaker 2 Prototype , 2018, Front. Neurosci..

[13]  Gianluca Francini,et al.  Learning Sparse Neural Networks via Sensitivity-Driven Regularization , 2018, NeurIPS.

[14]  Trevor Darrell,et al.  Rethinking the Value of Network Pruning , 2018, ICLR.

[15]  Philip H. S. Torr,et al.  SNIP: Single-shot Network Pruning based on Connection Sensitivity , 2018, ICLR.

[16]  Vivienne Sze,et al.  Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices , 2018, IEEE Journal on Emerging and Selected Topics in Circuits and Systems.

[17]  Utku Evci,et al.  Detecting Dead Weights and Units in Neural Networks , 2018, ArXiv.

[18]  Yongqiang Lyu,et al.  SNrram: An Efficient Sparse Neural Network Computation Architecture Based on Resistive Random-Access Memory , 2018, 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC).

[19]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[20]  Fred A. Hamprecht,et al.  Essentially No Barriers in Neural Network Energy Landscape , 2018, ICML.

[21]  David P. Wipf,et al.  Compressing Neural Networks using the Variational Information Bottleneck , 2018, ICML.

[22]  Andrew Gordon Wilson,et al.  Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs , 2018, NeurIPS.

[23]  Erich Elsen,et al.  Efficient Neural Audio Synthesis , 2018, ICML.

[24]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Diederik P. Kingma,et al.  Learning Sparse Neural Networks through L0 Regularization , 2017, ICLR.

[26]  David Kappel,et al.  Deep Rewiring: Training very sparse deep networks , 2017, ICLR.

[27]  Suyog Gupta,et al.  To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.

[28]  Peter Stone,et al.  Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science , 2017, Nature Communications.

[29]  Dmitry P. Vetrov,et al.  Structured Bayesian Pruning via Log-Normal Multiplicative Noise , 2017, NIPS.

[30]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[31]  Erich Elsen,et al.  Exploring Sparsity in Recurrent Neural Networks , 2017, ICLR.

[32]  Dmitry P. Vetrov,et al.  Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[33]  R. Venkatesh Babu,et al.  Training Sparse Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[34]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning , 2016, ArXiv.

[35]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[36]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[37]  Yurong Chen,et al.  Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[38]  Yiran Chen,et al.  Holistic SparseCNN: Forging the Trident of Accuracy, Speed, and Size , 2016, ArXiv.

[39]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[40]  Michael Garland,et al.  Merge-based sparse matrix-vector multiplication (SpMV) using the CSR storage format , 2016, PPoPP.

[41]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[42]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[44]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[45]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[46]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[47]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[48]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[49]  Nikko Ström,et al.  Sparse connection and pruning in large dynamic artificial neural networks , 1997, EUROSPEECH.

[50]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[51]  M. Ashby,et al.  Exploiting Unstructured Sparsity on Next-Generation Datacenter Hardware , 2019 .

[52]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[53]  Nikko Strom,et al.  Sparse connection and pruning in large dynamic artificial neural networks. , 1997 .

[54]  Emile Fiesler,et al.  Evaluating pruning methods , 1995 .

[55]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[56]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.