Support NNEF execution model for NNAPI

With growing applications such as image recognition, speech recognition, ADAS, and AIoT, artificial intelligence (AI) frameworks are becoming popular in various industries. Currently, many choices for neural network frameworks exist for executing AI models in applications, especially for training/inference purposes, including TensorFlow, Caffe, MXNet, PyTorch, Core ML, TensorFlow Lite, and NNAPI. With so many different emerging frameworks, exchange formats are needed for different AI frameworks. Given this requirement, the Khronos group created a standard draft known as the Neural Network Exchange Format (NNEF). However, because NNEF is new, conversion tools for various AI frameworks that would allow the exchange of various AI frameworks remain missing. In this work, we fill this gap by devising NNAPI conversion tools for NNEF. Our work allows NNEF to execute inference tasks on host and Android platforms and flexibly invokes Android neural networks through the API (NNAPI) on the Android platform to speed up inference operations. We invoke NNAPI by dividing the input NNEF model into multiple submodels and let NNAPI execute these submodels. We develop an algorithm named BFSelector that is based on a classic breadth-first search and includes cost constraints to determine how to divide the input model. Our preliminary experimental results show that our support of NNEF on NNAPI can obtain a speedup of 1.32 to 22.52 times over the baseline for API 27 and of 4.56 to 211 times over the baseline for API 28, where the baseline is the NNEF-to-Android platform conversion without invoking NNAPI. The experiment includes AI models such as LeNet, AlexNet, MobileNet_V1, MobileNet_V2, VGG-16, and VGG-19.

[1]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[2]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Chen-Ting Chao,et al.  Accelerate DNN Performance with Sparse Matrix Compression in Halide , 2019, ICPP Workshops.

[4]  Jenq-Kuen Lee,et al.  Enabling Android NNAPI Flow for TVM Runtime , 2020, ICPP Workshops.

[5]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Tianqi Chen,et al.  Relay: a new IR for machine learning frameworks , 2018, MAPL@PLDI.

[8]  Alex Sherstinsky,et al.  Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network , 2018, Physica D: Nonlinear Phenomena.

[9]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11]  Jenq-Kuen Lee,et al.  Accelerating NNEF Framework on OpenCL Devices Using clDNN , 2020, IWOCL.

[12]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[13]  Haichen Shen,et al.  TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.

[14]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.