Deep Convolutional Neural Network on iOS Mobile Devices

Deep Convolutional Neural Network (CNN) draws significant attention in the computer vision community by facilitating machines with more intelligence in understanding visual signals, however, its computation complexity has also increased significantly. To achieve ubiquitous machine intelligence, deep CNN is required to be ported onto local devices rather than cloud-based solution due to low latency consideration. Hence, in this paper, we propose a method to explore the design space for porting deep CNN onto iOS mobile devices, with attempts in maximizing data reusability, which alleviates the high bandwidth burden in the convolution layers of CNN. Furthermore, effective data reuse also makes possible the parallelization of all computing threads without data loading latency. On the other hand, deep CNN is usually over-parametrized with many unused convolution kernels. Based on Algorithm/Architecture Co-Exploration, we introduced a method in pruning redundant kernels in deep CNN with ignorable performance degradation on validation dataset (0.06% loss). This reduces 29% of operations and 34% of storage size on a 16-layer CNN. We used iPhone 6s and iPad Pro for case studies, and ported 8-layer and 16-layer CNNs onto targeted devices. The data reusability strategy improves computation speed up to 1.3x, and redundant kernel removal increases computation speed to 1.43x. As a result, we achieved high computation efficiency and have thus enhanced the capability of machine intelligence on local mobile devices.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Jeff Johnson,et al.  Fast Convolutional Nets With fbfft: A GPU Performance Evaluation , 2014, ICLR.

[3]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[4]  Ching-Yung Lin,et al.  Reconfigurable filter bank design via principal component analysis and low rank approximation , 2015, 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[5]  Joan Bruna,et al.  Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.

[6]  Gwo Giun Lee,et al.  Algorithm/Architecture Co-Exploration of Visual Computing on Emergent Platforms: Overview and Future Prospects , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[7]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  S. Winograd Arithmetic complexity of computations , 1980 .

[9]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.

[10]  Gwo Giun Lee,et al.  Mapping visual signal processing onto multi-core platform via algorithm/architecture co-exploration , 2014, 2014 IEEE Workshop on Signal Processing Systems (SiPS).

[11]  Marco Mattavelli,et al.  An Introduction to the Special Issue on Algorithm/Architecture Co-Exploration of Visual Computing on Emerging Platforms , 2009 .

[12]  Andrew Lavin,et al.  Fast Algorithms for Convolutional Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Andrew Zisserman,et al.  Speeding up Convolutional Neural Networks with Low Rank Expansions , 2014, BMVC.

[14]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[17]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[18]  Nicholas D. Lane,et al.  DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices , 2016, 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[19]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[21]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[22]  Shih-Fu Chang,et al.  An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[25]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).