ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

We introduce an extremely computation-efficient CNN architecture named ShuffleNet, which is designed specially for mobile devices with very limited computing power (e.g., 10-150 MFLOPs). The new architecture utilizes two new operations, pointwise group convolution and channel shuffle, to greatly reduce computation cost while maintaining accuracy. Experiments on ImageNet classification and MS COCO object detection demonstrate the superior performance of ShuffleNet over other structures, e.g. lower top-1 error (absolute 7.8%) than recent MobileNet [12] on ImageNet classification task, under the computation budget of 40 MFLOPs. On an ARM-based mobile device, ShuffleNet achieves ~13× actual speedup over AlexNet while maintaining comparable accuracy.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Andrew Zisserman,et al.  Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[5]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[7]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[8]  Yann LeCun,et al.  Fast Training of Convolutional Networks through FFTs , 2013, ICLR.

[9]  Ron Meir,et al.  Expectation Backpropagation: Parameter-Free Training of Multilayer Neural Networks with Continuous or Discrete Weights , 2014, NIPS.

[10]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[11]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[12]  Eugenio Culurciello,et al.  Flattened Convolutional Neural Networks for Feedforward Acceleration , 2014, ICLR.

[13]  Jeff Johnson,et al.  Fast Convolutional Nets With fbfft: A GPU Performance Evaluation , 2014, ICLR.

[14]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[16]  Ivan V. Oseledets,et al.  Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition , 2014, ICLR.

[17]  Jian Sun,et al.  Convolutional neural networks at constrained time cost , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Jian Sun,et al.  Efficient and accurate approximations of nonlinear convolutional networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[22]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[23]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[26]  Yiran Chen,et al.  Learning Structured Sparsity in Deep Neural Networks , 2016, NIPS.

[27]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[28]  Shuchang Zhou,et al.  DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[29]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[30]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[31]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[32]  Jian Sun,et al.  Accelerating Very Deep Convolutional Networks for Classification and Detection , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Yeongjae Cheon,et al.  PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection , 2016, ArXiv.

[34]  Hassan Foroosh,et al.  Design of Efficient Convolutional Layers using Single Intra-channel Convolution, Topological Subdivisioning and Spatial "Bottleneck" Structure , 2016, 1608.04337.

[35]  Jian Cheng,et al.  Quantized Convolutional Neural Networks for Mobile Devices , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Jingdong Wang,et al.  Interleaved Group Convolutions for Deep Neural Networks , 2017, ArXiv.

[37]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[39]  Lin Xu,et al.  Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights , 2017, ICLR.

[40]  Quoc V. Le,et al.  Swish: a Self-Gated Activation Function , 2017, 1710.05941.

[41]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[43]  Ali Farhadi,et al.  LCNN: Lookup-Based Convolutional Neural Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Roberto Cipolla,et al.  Deep Roots: Improving CNN Efficiency with Hierarchical Filter Groups , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Quoc V. Le,et al.  Searching for Activation Functions , 2018, arXiv.

[47]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.