LETI: Latency Estimation Tool and Investigation of Neural Networks inference on Mobile GPU

A lot of deep learning applications are desired to be run on mobile devices. Both accuracy and inference time are meaningful for a lot of them. While the number of FLOPs is usually used as a proxy for neural network latency, it may be not the best choice. In order to obtain a better approximation of latency, research community uses look-up tables of all possible layers for latency calculation for the final prediction of the inference on mobile CPU. It requires only a small number of experiments. Unfortunately, on mobile GPU this method is not applicable in a straight-forward way and shows low precision. In this work, we consider latency approximation on mobile GPU as a data and hardware-specific problem. Our main goal is to construct a convenient latency estimation tool for investigation(LETI) of neural network inference and building robust and accurate latency prediction models for each specific task. To achieve this goal, we build open-source tools which provide a convenient way to conduct massive experiments on different target devices focusing on mobile GPU. After evaluation of the dataset, we learn the regression model on experimental data and use it for future latency prediction and analysis. We experimentally demonstrate the applicability of such an approach on a subset of popular NAS-Benchmark 101 dataset and also evaluate the most popular neural network architectures for two mobile GPUs. As a result, we construct latency prediction model with good precision on the target evaluation subset. We consider LETI as a useful tool for neural architecture search or massive latency evaluation. The project is available at this https URL

[1]  Sergey Pavlov,et al.  “Zhores” — Petaflops supercomputer for data-driven modeling, machine learning and artificial intelligence installed in Skolkovo Institute of Science and Technology , 2019, Open Engineering.

[2]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[3]  Bo Zhang,et al.  MoGA: Searching Beyond Mobilenetv3 , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Aaron Klein,et al.  NAS-Bench-101: Towards Reproducible Neural Architecture Search , 2019, ICML.

[5]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Nicholas D. Lane,et al.  BRP-NAS: Prediction-based NAS using GCNs , 2020, NeurIPS.

[7]  Yuandong Tian,et al.  FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Kaiyong Zhao,et al.  AutoML: A Survey of the State-of-the-Art , 2019, Knowl. Based Syst..

[9]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[10]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Juhyun Lee,et al.  On-Device Neural Net Inference with Mobile GPUs , 2019, ArXiv.

[12]  Niraj K. Jha,et al.  ChamNet: Towards Efficient Network Design Through Platform-Aware Model Adaptation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[14]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[15]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[16]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[17]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[18]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[19]  Yi Yang,et al.  NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search , 2020, ICLR.