Analyzing Deep Learning Model Inferences for Image Classification using OpenVINO

It may be desirable to execute deep learning model inferences on an integrated GPU at the edge. While such GPUs are much less powerful than discrete GPUs, it is able to deliver higher floating-point operations per second than a CPU located on the same die. For edge devices, the benefit of moving to lower precision with minimal loss of accuracy to obtain higher performance is also attractive. Hence, we chose 14 deep learning models for image classification to evaluate their inference performance with the OpenVINO toolkit. Then, we analyzed the implementation of the fastest inference model of all the models. The experimental results are promising. Compared to the performance of full-precision (FP32) models, the speedup of the 8-bit (INT8) quantization ranges from 1.02 to 1.56 on an Intel® Xeon® 4-core CPU, and the speedup of the FP16 models ranges from 1.1 to 2 on an Intel® IrisTM Pro GPU. For the FP32 models, the GPU is on average 1.5X faster than the CPU.

[1]  Carole-Jean Wu,et al.  Machine Learning at Facebook: Understanding Inference at the Edge , 2019, 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[2]  Yida Wang,et al.  A Unified Optimization Approach for CNN Model Inference on Integrated GPUs , 2019, ICPP.

[3]  Avinash Sodani,et al.  Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition 2nd Edition , 2016 .

[4]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[5]  Ben Ashbaugh Debugging and Analyzing Programs Using the Intercept Layer for OpenCL Applications , 2018, IWOCL.

[6]  Mahmut T. Kandemir,et al.  Managing GPU Concurrency in Heterogeneous Architectures , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.

[7]  Prabhat,et al.  Acceleration of Scientific Deep Learning Models on Heterogeneous Computing Platform with Intel® FPGAs , 2019, ISC Workshops.

[8]  John E. Stone,et al.  OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.

[9]  Haichen Shen,et al.  TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.

[10]  Michal Mrozek Breaking the last line of performance border , 2019, IWOCL.

[11]  Yida Wang,et al.  Optimizing CNN Model Inference on CPUs , 2018, USENIX Annual Technical Conference.

[12]  Yury Gorbachev,et al.  OpenVINO Deep Learning Workbench: Comprehensive Analysis and Tuning of Neural Networks Inference , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[13]  Prasun Gera,et al.  Performance Characterisation and Simulation of Intel's Integrated GPU Architecture , 2018, 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[14]  Weisong Shi,et al.  Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[15]  D. Scott Cyphers,et al.  Intel® nGraphTM , 2018 .