MLPerf Mobile Inference Benchmark: Why Mobile AI Benchmarking Is Hard and What to Do About It

MLPerf Mobile is the first industry-standard open-source mobile benchmark developed by industry members and academic researchers to allow performance/accuracy evaluation of mobile devices with different AI chips and software stacks. The benchmark draws from the expertise of leading mobile-SoC vendors, ML-framework providers, and model producers. In this paper, we motivate the drive to demystify mobile-AI performance and present MLPerf Mobile's design considerations, architecture, and implementation. The benchmark comprises a suite of models that operate under standard models, data sets, quality metrics, and run rules. For the first iteration, we developed an app to provide an "out-of-the-box" inference-performance benchmark for computer vision and natural-language processing on mobile devices. MLPerf Mobile can serve as a framework for integrating future models, for customizing quality-target thresholds to evaluate system performance, for comparing software frameworks, and for assessing heterogeneous-hardware capabilities for machine learning, all fairly and faithfully with fully reproducible results.

[1]  Yiming Yang,et al.  MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices , 2020, ACL.

[2]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[3]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[4]  Jordi Pont-Tuset,et al.  The Open Images Dataset V4 , 2018, International Journal of Computer Vision.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Yann LeCun,et al.  Predicting Deeper into the Future of Semantic Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[8]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[9]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[10]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[11]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[12]  Huei-Yung Lin,et al.  Robotic Grasping Using Semantic Segmentation and Primitive Geometric Model Based 3D Pose Estimation , 2020, 2020 IEEE/SICE International Symposium on System Integration (SII).

[13]  Julien Cohen-Adad,et al.  Deep semantic segmentation of natural and medical images: a review , 2019, Artificial Intelligence Review.

[14]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[18]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[19]  Martin Jägersand,et al.  Deep semantic segmentation for automated driving: Taxonomy, roadmap and challenges , 2017, 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC).

[20]  Chia-Hung Liu,et al.  7.1 A 3.4-to-13.3TOPS/W 3.6TOPS Dual-Core Deep-Learning Accelerator for Versatile AI Applications in 7nm 5G Smartphone SoC , 2020, 2020 IEEE International Solid- State Circuits Conference - (ISSCC).

[21]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[22]  김용한,et al.  Single Shot Multibox Detector를 적용한 실시간 선박 검출 및 분류 시스템 개발 , 2020 .

[23]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[24]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Xavier Vera Inside Tiger Lake: Intel’s Next Generation Mobile Client CPU , 2020, 2020 IEEE Hot Chips 32 Symposium (HCS).

[26]  Luc Van Gool,et al.  AI Benchmark: All About Deep Learning on Smartphones in 2019 , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[27]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Junhee Seok,et al.  Indoor Semantic Segmentation for Robot Navigating on Mobile , 2018, 2018 Tenth International Conference on Ubiquitous and Future Networks (ICUFN).

[29]  Bolei Zhou,et al.  Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Jamie Sherrah,et al.  Fully Convolutional Networks for Dense Semantic Labelling of High-Resolution Aerial Imagery , 2016, ArXiv.

[32]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).