Execution Time Modeling for CNN Inference on Embedded GPUs

Machine learning is one of the most cutting edge methods in computer vision. Convolutional Neural Networks (CNN) in particular are widely used in edge computing based applications such as autonomous driving for image recognition or object tracking. Different constraints exist in this application area such as real-time, energy consumption, memory resources, etc. Choosing the optimal CNN for each GPU at hand is really hard to do, while maintaining high levels of accuracy and performance. This makes prior knowledge about the execution time a necessary prerequisite information before the final deployment of the CNN on the edge GPU platform. In this paper, we compare 5 execution time prediction models on a large set of CNNs-based applications. The tested predictors use machine learning regression approach. The proposed methodology is based on the utilization of high level CNN features. At the opposite of state-of-the-art approaches, no implementation or profiling on the hardware is required. A Mean Absolute Percentage Error (MAPE) of 5% using Support Vector Regression and Artificial Neural Networks has been obtained in the experiments. Our comparison shows the efficiency of these models to rapidly explore a large space of CNN models or Hardware configurations.

[1]  Hamed Taherdoost,et al.  Sampling Methods in Research Methodology; How to Choose a Sampling Technique for Research , 2016 .

[2]  Matti Siekkinen,et al.  Latency and throughput characterization of convolutional neural networks for mobile computer vision , 2018, MMSys.

[3]  Thomas F. La Porta,et al.  Augur: Modeling the Resource Requirements of ConvNets on Mobile Devices , 2021, IEEE Transactions on Mobile Computing.

[4]  BengioYoshua,et al.  Random search for hyper-parameter optimization , 2012 .

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Denis Trystram,et al.  A comparison of GPU execution time prediction using machine learning and analytical modeling , 2016, 2016 IEEE 15th International Symposium on Network Computing and Applications (NCA).

[9]  Shuaiwen Song,et al.  A Simplified and Accurate Model of Power-Performance Efficiency on Emergent GPU Architectures , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[10]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[11]  Paolo Napoletano,et al.  Benchmark Analysis of Representative Deep Neural Network Architectures , 2018, IEEE Access.

[12]  A. Stephen McGough,et al.  Predicting the Computational Cost of Deep Learning Models , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[13]  Holger Fröning,et al.  A Simple Model for Portable and Fast Prediction of Execution Time and Power Consumption of GPU Kernels , 2021, ACM Trans. Archit. Code Optim..

[14]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15]  Ameet Talwalkar,et al.  Paleo: A Performance Model for Deep Neural Networks , 2016, ICLR.

[16]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Vijay Vasudevan,et al.  Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Eugenio Gianniti,et al.  Performance Prediction of GPU-Based Deep Learning Applications , 2018, 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).