论文信息 - Lottery Ticket Implies Accuracy Degradation, Is It a Desirable Phenomenon?

Lottery Ticket Implies Accuracy Degradation, Is It a Desirable Phenomenon?

In deep model compression, the recent finding “Lottery Ticket Hypothesis” (LTH) (Frankle & Carbin, 2018) pointed out that there could exist a winning ticket (i.e., a properly pruned subnetwork together with original weight initialization) that can achieve competitive performance than the original dense network. However, it is not easy to observe such winning property in many scenarios, where for example, a relatively large learning rate is used even if it benefits training the original dense model. In this work, we investigate the underlying condition and rationale behind the winning property, and find that the underlying reason is largely attributed to the correlation between initialized weights and final-trained weights when the learning rate is not sufficiently large. Thus, the existence of winning property is correlated with an insufficient DNN pretraining, and is unlikely to occur for a well-trained DNN. To overcome this limitation, we propose the “pruning & fine-tuning” method that consistently outperforms lottery ticket sparse training under the same pruning algorithm and the same total training epochs. Extensive experiments over multiple deep models (VGG, ResNet, MobileNetv2) on different datasets have been conducted to justify our proposals.

[1] Niraj K. Jha,et al. NeST: A Neural Network Synthesis Tool Based on a Grow-and-Prune Paradigm , 2017, IEEE Transactions on Computers.

[2] Shiyu Chang,et al. The Lottery Ticket Hypothesis for Pre-trained BERT Networks , 2020, NeurIPS.

[3] Daniel L. K. Yamins,et al. Pruning neural networks without any data by iteratively conserving synaptic flow , 2020, NeurIPS.

[4] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5] Yuandong Tian,et al. One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers , 2019, NeurIPS.

[6] Yiran Chen,et al. Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[7] Rongrong Ji,et al. HRank: Filter Pruning Using High-Rank Feature Map , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] David J. Schwab,et al. The Early Phase of Neural Network Training , 2020, ICLR.

[9] Mehul Motani,et al. DropNet: Reducing Neural Network Complexity via Iterative Pruning , 2020, ICML.

[10] Gintare Karolina Dziugaite,et al. Stabilizing the Lottery Ticket Hypothesis , 2019 .

[11] Yiran Chen,et al. 2PFPCE: Two-Phase Filter Pruning Based on Conditional Entropy , 2018, ArXiv.

[12] Adam R. Klivans,et al. Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection , 2020, ICML.

[13] Alexander G. Gray,et al. Stochastic Alternating Direction Method of Multipliers , 2013, ICML.

[14] Jieping Ye,et al. AutoCompress: An Automatic DNN Structured Pruning Framework for Ultra-High Compression Rates , 2020, AAAI.

[15] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[16] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[17] Xiangyu Zhang,et al. Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18] Ping Liu,et al. Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.