论文信息 - BlockSwap: Fisher-guided Block Substitution for Network Compression

BlockSwap: Fisher-guided Block Substitution for Network Compression

The desire to map neural networks to varying-capacity devices has led to the development of a wealth of compression techniques, many of which involve replacing standard convolutional blocks in a large network with cheap alternative blocks. However, not all blocks are created equally; for a required compute budget there may exist a potent combination of many different cheap blocks, though exhaustively searching for such a combination is prohibitively expensive. In this work, we develop BlockSwap: a fast algorithm for choosing networks with interleaved block types by passing a single minibatch of training data through randomly initialised networks and gauging their Fisher potential. These networks can then be used as students and distilled with the original large network as a teacher. We demonstrate the effectiveness of the chosen networks across CIFAR-10 and ImageNet for classification, and COCO for detection, and provide a comprehensive ablation study of our approach. BlockSwap quickly explores possible block configurations using a simple architecture ranking system, yielding highly competitive networks in orders of magnitude less time than most architecture search techniques (e.g. 8 minutes on a single CPU for CIFAR-10).

[1] Ameet Talwalkar,et al. Random Search and Reproducibility for Neural Architecture Search , 2019, UAI.

[2] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Yiming Yang,et al. DARTS: Differentiable Architecture Search , 2018, ICLR.

[4] James T. Kwok,et al. Loss-aware Binarization of Deep Networks , 2016, ICLR.

[5] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[6] Mingjie Sun,et al. Rethinking the Value of Network Pruning , 2018, ICLR.

[7] R. Venkatesh Babu,et al. Data-free Parameter Pruning for Deep Neural Networks , 2015, BMVC.

[8] George Papandreou,et al. Searching for Efficient Multi-Scale Architectures for Dense Image Prediction , 2018, NeurIPS.

[9] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[10] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[11] Philip H. S. Torr,et al. SNIP: Single-shot Network Pruning based on Connection Sensitivity , 2018, ICLR.

[12] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13] Guy Lemieux,et al. Full Deep Neural Network Training On A Pruned Weight Budget , 2018, MLSys.

[14] Yoshua Bengio,et al. FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[15] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Zachary Chase Lipton,et al. Born Again Neural Networks , 2018, ICML.

[17] Misha Denil,et al. Predicting Parameters in Deep Learning , 2014 .

[18] Amos Storkey,et al. A Closer Look at Structured Pruning for Neural Network Compression , 2018 .

[19] Yuandong Tian,et al. FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Timo Aila,et al. Pruning Convolutional Neural Networks for Resource Efficient Inference , 2016, ICLR.

[21] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Amos J. Storkey,et al. Moonshine: Distilling with Cheap Convolutions , 2017, NeurIPS.

[23] Jungwon Lee,et al. Towards the Limit of Network Quantization , 2016, ICLR.

[24] Lucas Theis,et al. Faster gaze prediction with dense networks and Fisher pruning , 2018, ArXiv.

[25] Yurong Chen,et al. Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[26] Graham W. Taylor,et al. Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[27] VALENTIN RADU,et al. Multimodal Deep Learning for Activity and Context Recognition , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[28] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[29] Babak Hassibi,et al. Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[30] Martin Jaggi,et al. Evaluating the Search Phase of Neural Architecture Search , 2019, ICLR.

[31] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[32] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[33] Quoc V. Le,et al. Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[34] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[35] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[36] Roberto Cipolla,et al. Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[38] Rich Caruana,et al. Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[39] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[40] Tieniu Tan,et al. Accelerating Deep Neural Networks with Spatial Bottleneck Modules , 2018, ArXiv.

[41] Xiangyu Zhang,et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[43] Greg Mori,et al. Constraint-Aware Deep Neural Network Compression , 2018, ECCV.

[44] Yann LeCun,et al. Optimal Brain Damage , 1989, NIPS.

[45] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.

[47] Nikos Komodakis,et al. Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[48] Jangho Kim,et al. Paraphrasing Complex Network: Network Compression via Factor Transfer , 2018, NeurIPS.

[49] Tie-Yan Liu,et al. Neural Architecture Optimization , 2018, NeurIPS.

[50] Kilian Q. Weinberger,et al. CondenseNet: An Efficient DenseNet Using Learned Group Convolutions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51] Bo Chen,et al. MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Matthew Richardson,et al. Do Deep Convolutional Nets Really Need to be Deep and Convolutional? , 2016, ICLR.

[53] Quoc V. Le,et al. Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.