AI Benchmark: All About Deep Learning on Smartphones in 2019

The performance of mobile AI accelerators has been evolving rapidly in the past two years, nearly doubling with each new generation of SoCs. The current 4th generation of mobile NPUs is already approaching the results of CUDA-compatible Nvidia graphics cards presented not long ago, which together with the increased capabilities of mobile deep learning frameworks makes it possible to run complex and deep AI models on mobile devices. In this paper, we evaluate the performance and compare the results of all chipsets from Qualcomm, HiSilicon, Samsung, MediaTek and Unisoc that are providing hardware acceleration for AI inference. We also discuss the recent changes in the Android ML pipeline and provide an overview of the deployment of deep learning models on mobile devices. All numerical results provided in this paper can be found and are regularly updated on the official project website: http://ai-benchmark.com.

[1]  Gary M. Weiss,et al.  Activity recognition using cell phone accelerometers , 2011, SKDD.

[2]  Tomasz Malisiewicz,et al.  Deep Image Homography Estimation , 2016, ArXiv.

[3]  Luc Van Gool,et al.  DSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Gerhard P. Hancke,et al.  Gesture recognition as ubiquitous input for mobile phones , 2008 .

[5]  Luc Van Gool,et al.  WESPE: Weakly Supervised Photo Enhancer for Digital Cameras , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[6]  Matti Pietikäinen,et al.  Face and Eye Detection for Person Authentication in Mobile Phones , 2007, 2007 First ACM/IEEE International Conference on Distributed Smart Cameras.

[7]  Toru Irie,et al.  Universal design activities for mobile phone : Raku Raku PHONE , 2005 .

[8]  Fahad Shahbaz Khan,et al.  NTIRE 2019 Challenge on Image Enhancement: Methods and Results , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[9]  Ke Wang,et al.  AI Benchmark: Running Deep Neural Networks on Android Smartphones , 2018, ECCV Workshops.

[10]  Luc Van Gool,et al.  PIRM Challenge on Perceptual Image Enhancement on Smartphones: Report , 2018, ECCV Workshops.

[11]  Bhaskar Saha,et al.  An Adaptive Recurrent Neural Network for Remaining Useful Life Prediction of Lithium-ion Batteries , 2010 .

[12]  I. Scott MacKenzie,et al.  Predicting text entry speed on mobile phones , 2000, CHI.

[13]  Éric Anquetil,et al.  Integration of an on-line handwriting recognition system in a smart phone device , 2002, Object recognition supported by user interaction for service robots.

[14]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[15]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[16]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[17]  Max Welling,et al.  Relaxed Quantization for Discretized Neural Networks , 2018, ICLR.

[18]  Simon Dixon,et al.  Improved music feature learning with deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Yair Movshovitz-Attias,et al.  Synthetic depth-of-field with a single-camera mobile phone , 2018, ACM Trans. Graph..

[20]  Dmitry Yu. Ignatov,et al.  Decision Stream: Cultivating Deep Decision Trees , 2017, 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI).

[21]  Shafiq R. Joty,et al.  Sleep Quality Prediction From Wearable Data Using Deep Learning , 2016, JMIR mHealth and uHealth.

[22]  Ming-Hsuan Yang,et al.  Object Tracking Benchmark , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[24]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[25]  Andrey Ignatov,et al.  Real-time human activity recognition from accelerometer data using Convolutional Neural Networks , 2018, Appl. Soft Comput..

[26]  Masashi Koga,et al.  Camera-based Kanji OCR for mobile-phones: practical issues , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[27]  Luc Van Gool,et al.  NTIRE 2018 Challenge on Single Image Super-Resolution: Methods and Results , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[28]  Raghuraman Krishnamoorthi,et al.  Quantizing deep convolutional networks for efficient inference: A whitepaper , 2018, ArXiv.

[29]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[31]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[32]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[33]  Rajat Raina,et al.  Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.

[34]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[36]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[37]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Taesung Park,et al.  Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Joelle Pineau,et al.  A Deep Reinforcement Learning Chatbot , 2017, ArXiv.

[40]  Koray Kavukcuoglu,et al.  Pixel Recurrent Neural Networks , 2016, ICML.

[41]  Jae-Gon Lee,et al.  7.1 An 11.5TOPS/W 1024-MAC Butterfly Structure Dual-Core Sparsity-Aware Neural Processing Unit in 8nm Flagship Mobile SoC , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).

[42]  George Papandreou,et al.  Weakly-and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[43]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[44]  Luca Maria Gambardella,et al.  Flexible, High Performance Convolutional Neural Networks for Image Classification , 2011, IJCAI.

[45]  Florian Michahelles,et al.  Evaluation of 1D barcode scanning on mobile phones , 2010, 2010 Internet of Things (IOT).

[46]  Xiaojuan Qi,et al.  ICNet for Real-Time Semantic Segmentation on High-Resolution Images , 2017, ECCV.

[47]  George Theocharous,et al.  Machine Learning for Adaptive Power Management , 2006 .

[48]  Juhyun Lee,et al.  On-Device Neural Net Inference with Mobile GPUs , 2019, ArXiv.

[49]  Andreas Geiger,et al.  Augmented Reality meets Deep Learning , 2017, BMVC.

[50]  Luca Maria Gambardella,et al.  Max-pooling convolutional neural networks for vision-based hand gesture recognition , 2011, 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA).

[51]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Xiaoou Tang,et al.  Image Super-Resolution Using Deep Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[55]  Daniel Roggen,et al.  Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition , 2016, Sensors.

[56]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[57]  Zhen Wang,et al.  uWave: Accelerometer-based Personalized Gesture Recognition and Its Applications , 2009, PerCom.

[58]  Yasushi Makihara,et al.  Object recognition supported by user interaction for service robots , 2002, Object recognition supported by user interaction for service robots.

[59]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[61]  Alessandro Moschitti,et al.  Twitter Sentiment Analysis with Deep Convolutional Neural Networks , 2015, SIGIR.

[62]  Luc Van Gool,et al.  NTIRE 2017 Challenge on Single Image Super-Resolution: Methods and Results , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[63]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[64]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[65]  Gang Hua,et al.  A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Emiliano Miluzzo,et al.  EyePhone: activating mobile phones with your eyes , 2010, MobiHeld '10.

[67]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[68]  Markus Nagel,et al.  Data-Free Quantization Through Weight Equalization and Bias Correction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[69]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[70]  Avery Wang,et al.  The Shazam music recognition service , 2006, CACM.

[71]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[72]  Kyoung Mu Lee,et al.  Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).