论文信息 - S-Cyc: A Learning Rate Schedule for Iterative Pruning of ReLU-based Networks

S-Cyc: A Learning Rate Schedule for Iterative Pruning of ReLU-based Networks

We explore a new perspective on adapting the learning rate (LR) schedule to improve the performance of the ReLUbased network as it is iteratively pruned. Our work and contribution consist of four parts: (i) We find that, as the ReLUbased network is iteratively pruned, the distribution of weight gradients tends to become narrower. This leads to the finding that as the network becomes more sparse, a larger value of LR should be used to train the pruned network. (ii) Motivated by this finding, we propose a novel LR schedule, called S-Cyclical (S-Cyc) which adapts the conventional cyclical LR schedule by gradually increasing the LR upper bound (max lr) in an S-shape as the network is iteratively pruned. We highlight that S-Cyc is a method agnostic LR schedule that applies to many iterative pruning methods. (iii) We evaluate the performance of the proposed S-Cyc and compare it to four LR schedule benchmarks. Our experimental results on three state-of-the-art networks (e.g., VGG-19, ResNet20, ResNet-50) and two popular datasets (e.g., CIFAR-10, ImageNet-200) demonstrate that S-Cyc consistently outperforms the best performing benchmark with an improvement of 2.1% 3.4%, without substantial increase in complexity. (iv) We evaluate S-Cyc against an oracle and show that SCyc achieves comparable performance to the oracle, which carefully tunes max lr via grid search.

Mehul Motani | Shiyu Liu | Chong Min John Tan

[1] Raj Kumar Maity,et al. vqSGD: Vector Quantized Stochastic Gradient Descent , 2019, IEEE Transactions on Information Theory.

[2] Daniel L. K. Yamins,et al. Pruning neural networks without any data by iteratively conserving synaptic flow , 2020, NeurIPS.

[3] Philip H. S. Torr,et al. SNIP: Single-shot Network Pruning based on Connection Sensitivity , 2018, ICLR.

[4] Sebastian Ruder,et al. An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[5] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.

[6] Qionghai Dai,et al. Exponential decay sine wave learning rate for fast deep neural network training , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[7] Leslie N. Smith,et al. Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[8] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9] Gilad Yehudai,et al. Proving the Lottery Ticket Hypothesis: Pruning is All You Need , 2020, ICML.

[10] Klaus-Robert Müller,et al. Pruning by Explaining: A Novel Criterion for Deep Neural Network Pruning , 2019, Pattern Recognit..

[11] Yoshua Bengio,et al. Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[12] Hang Su,et al. Dynamic Network Pruning with Interpretable Layerwise Channel Selection , 2020, AAAI.

[13] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[14] A. Krizhevsky. Convolutional Deep Belief Networks on CIFAR-10 , 2010 .

[15] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[16] Song Han,et al. APQ: Joint Search for Network Architecture, Pruning and Quantization Policy , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Hanan Samet,et al. Pruning Filters for Efficient ConvNets , 2016, ICLR.

[18] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[20] Sanjeev Arora,et al. On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.

[21] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.

[22] Hang Su,et al. Pruning from Scratch , 2019, AAAI.

[23] Gintare Karolina Dziugaite,et al. Linear Mode Connectivity and the Lottery Ticket Hypothesis , 2019, ICML.

[24] Lucas Theis,et al. Faster gaze prediction with dense networks and Fisher pruning , 2018, ArXiv.

[25] Rahul Mehta,et al. Sparse Transfer Learning via Winning Lottery Tickets , 2019, ArXiv.

[26] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[27] Philippe von Wurstemberger,et al. Strong error analysis for stochastic gradient descent optimization algorithms , 2018, 1801.09324.

[28] Yiming Yang,et al. DARTS: Differentiable Architecture Search , 2018, ICLR.

[29] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Suyog Gupta,et al. To prune, or not to prune: exploring the efficacy of pruning for model compression , 2017, ICLR.

[31] Rongrong Ji,et al. HRank: Filter Pruning Using High-Rank Feature Map , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Jose Javier Gonzalez Ortiz,et al. What is the State of Neural Network Pruning? , 2020, MLSys.

[33] Babak Hassibi,et al. Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[34] Yuandong Tian,et al. Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP , 2019, ICLR.

[35] Mingjie Sun,et al. Rethinking the Value of Network Pruning , 2018, ICLR.

[36] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[37] Jason Yosinski,et al. Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask , 2019, NeurIPS.

[38] Michael Carbin,et al. Comparing Rewinding and Fine-tuning in Neural Network Pruning , 2019, ICLR.

[39] Pavlo Molchanov,et al. Importance Estimation for Neural Network Pruning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Ya Le,et al. Tiny ImageNet Visual Recognition Challenge , 2015 .

[41] Mehul Motani,et al. DropNet: Reducing Neural Network Complexity via Iterative Pruning , 2020, ICML.

[42] Adam R. Klivans,et al. Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection , 2020, ICML.

[43] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[44] Jianxin Wu,et al. Neural Network Pruning With Residual-Connections and Limited-Data , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Sham M. Kakade,et al. The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure , 2019, NeurIPS.

[46] Larry S. Davis,et al. NISP: Pruning Networks Using Neuron Importance Score Propagation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47] Ping Liu,et al. Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Rui Peng,et al. Network Trimming: A Data-Driven Neuron Pruning Approach towards Efficient Deep Architectures , 2016, ArXiv.

[49] Zhi Zhang,et al. Bag of Tricks for Image Classification with Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Jianxin Wu,et al. ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[51] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[52] Misha Denil,et al. Predicting Parameters in Deep Learning , 2014 .

[53] Timo Aila,et al. Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[54] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[55] Sanguthevar Rajasekaran,et al. AutoPrune: Automatic Network Pruning by Regularizing Auxiliary Parameters , 2019, NeurIPS.