Pruning neural networks without any data by iteratively conserving synaptic flow

Pruning the parameters of deep neural networks has generated intense interest due to potential savings in time, memory and energy both during training and at test time. Recent works have identified, through an expensive sequence of training and pruning cycles, the existence of winning lottery tickets or sparse trainable subnetworks at initialization. This raises a foundational question: can we identify highly sparse trainable subnetworks at initialization, without ever training, or indeed without ever looking at the data? We provide an affirmative answer to this question through theory driven algorithm design. We first mathematically formulate and experimentally verify a conservation law that explains why existing gradient-based pruning algorithms at initialization suffer from layer-collapse, the premature pruning of an entire layer rendering a network untrainable. This theory also elucidates how layer-collapse can be entirely avoided, motivating a novel pruning algorithm Iterative Synaptic Flow Pruning (SynFlow). This algorithm can be interpreted as preserving the total flow of synaptic strengths through the network at initialization subject to a sparsity constraint. Notably, this algorithm makes no reference to the training data and consistently competes with or outperforms existing state-of-the-art pruning algorithms at initialization over a range of models (VGG and ResNet), datasets (CIFAR-10/100 and Tiny ImageNet), and sparsity constraints (up to 99.99 percent). Thus our data-agnostic pruning algorithm challenges the existing paradigm that, at initialization, data must be used to quantify which synapses are important.

[1]  Gintare Karolina Dziugaite,et al.  Stabilizing the Lottery Ticket Hypothesis , 2019 .

[2]  Yurong Chen,et al.  Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[3]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[4]  Erich Elsen,et al.  Rigging the Lottery: Making All Tickets Winners , 2020, ICML.

[5]  Jinwoo Shin,et al.  Lookahead: a Far-Sighted Alternative of Magnitude-based Pruning , 2020, ICLR.

[6]  David Kappel,et al.  Deep Rewiring: Training very sparse deep networks , 2017, ICLR.

[7]  Ehud D. Karnin,et al.  A simple procedure for pruning back-propagation trained neural networks , 1990, IEEE Trans. Neural Networks.

[8]  Yue Wang,et al.  Drawing early-bird tickets: Towards more efficient training of deep networks , 2019, ICLR.

[9]  Erich Elsen,et al.  The State of Sparsity in Deep Neural Networks , 2019, ArXiv.

[10]  Yuandong Tian,et al.  One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers , 2019, NeurIPS.

[11]  Xin Wang,et al.  Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization , 2019, ICML.

[12]  Philip H.S. Torr,et al.  Progressive Skeletonization: Trimming more fat from a network at initialization , 2020, ArXiv.

[13]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[14]  Larry S. Davis,et al.  NISP: Pruning Networks Using Neuron Importance Score Propagation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[16]  Pavlo Molchanov,et al.  Importance Estimation for Neural Network Pruning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Gintare Karolina Dziugaite,et al.  Pruning Neural Networks at Initialization: Why are We Missing the Mark? , 2020, ArXiv.

[18]  Michael Carbin,et al.  The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[19]  Yi Zhang,et al.  Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.

[20]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[21]  Yee Whye Teh,et al.  Pruning untrained neural networks: Principles and Analysis , 2020, ArXiv.

[22]  Wei Hu,et al.  Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced , 2018, NeurIPS.

[23]  Jason Yosinski,et al.  Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask , 2019, NeurIPS.

[24]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[25]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[26]  Raquel Urtasun,et al.  MLPrune: Multi-Layer Pruning for Automated Neural Network Compression , 2018 .

[27]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[28]  Gintare Karolina Dziugaite,et al.  Linear Mode Connectivity and the Lottery Ticket Hypothesis , 2019, ICML.

[29]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[30]  Yuandong Tian,et al.  Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP , 2019, ICLR.

[31]  Tomaso A. Poggio,et al.  Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.

[32]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[33]  Maarten Stol,et al.  Pruning via Iterative Ranking of Sensitivity Statistics , 2020, ArXiv.

[34]  Andrew Zisserman,et al.  Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[35]  Mingjie Sun,et al.  Rethinking the Value of Network Pruning , 2018, ICLR.

[36]  Jose Javier Gonzalez Ortiz,et al.  What is the State of Neural Network Pruning? , 2020, MLSys.

[37]  Philip H. S. Torr,et al.  A Signal Propagation Perspective for Pruning Neural Networks at Initialization , 2019, ICLR.

[38]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[39]  Ameya Prabhu,et al.  Deep Expander Networks: Efficient Deep Networks from Graph Theory , 2017, ECCV.

[40]  Guodong Zhang,et al.  Picking Winning Tickets Before Training by Preserving Gradient Flow , 2020, ICLR.

[41]  Mukund Sundararajan,et al.  How Important Is a Neuron? , 2018, ICLR.

[42]  Janowsky,et al.  Pruning versus clipping in neural networks. , 1989, Physical review. A, General physics.

[43]  Xin Dong,et al.  Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon , 2017, NIPS.

[44]  Ruslan Salakhutdinov,et al.  Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.

[45]  Michael C. Mozer,et al.  Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment , 1988, NIPS.

[46]  Alexander Novikov,et al.  Tensorizing Neural Networks , 2015, NIPS.

[47]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[48]  Surya Ganguli,et al.  From deep learning to mechanistic understanding in neuroscience: the structure of retinal prediction , 2019, NeurIPS.

[49]  Peter Stone,et al.  Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science , 2017, Nature Communications.

[50]  Wojciech Samek,et al.  Pruning by Explaining: A Novel Criterion for Deep Neural Network Pruning , 2019, Pattern Recognit..

[51]  Philip H. S. Torr,et al.  SNIP: Single-shot Network Pruning based on Connection Sensitivity , 2018, ICLR.

[52]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).