Cappuccino: Efficient CNN Inference Software Synthesis for Mobile System-on-Chips

Convolutional neural networks (CNNs) exhibit remarkable performance in various machine learning tasks. As sensor-equipped Internet of Things devices permeate into every aspect of modern life, the ability to execute CNN inference, a computationally intensive application, on resource constrained devices has become increasingly important. In this context, we present Cappuccino, a framework for synthesis of efficient inference software targeting mobile system-on-chips (SoCs). We propose techniques for efficient parallelization of CNN inference targeting mobile SoCs, and explore the underlying tradeoffs. Experiments with different CNNs on three mobile devices demonstrate the effectiveness of our approach.

[1]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[2]  Soheil Ghiasi,et al.  Design space exploration of FPGA-based Deep Convolutional Neural Networks , 2016, 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC).

[3]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[5]  Eunhyeok Park,et al.  Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications , 2015, ICLR.

[6]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[7]  Yu Wang,et al.  Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[10]  Soheil Ghiasi,et al.  Machine Intelligence on Resource-Constrained IoT Devices , 2017, ACM Trans. Embed. Comput. Syst..

[11]  Moustafa Alzantot,et al.  RSTensorFlow: GPU Enabled TensorFlow for Deep Learning on Commodity Android Devices , 2017, EMDL '17.

[12]  Jia Wang,et al.  DaDianNao: A Machine-Learning Supercomputer , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[13]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[14]  Soheil Ghiasi,et al.  CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android , 2015, ACM Multimedia.

[15]  Jun Zhou,et al.  Use of SIMD Vector Operations to Accelerate Application Code Performance on Low-Powered ARM and Intel Platforms , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[16]  Jason Cong,et al.  Minimizing Computation in Convolutional Neural Networks , 2014, ICANN.