Performance prediction for convolutional neural networks on edge GPUs

Edge computing is increasingly used for Artificial Intelligence (AI) purposes to meet latency, privacy, and energy challenges. Convolutional Neural networks (CNN) are more frequently deployed on Edge devices for several applications. However, due to their constrained computing resources and energy budget, Edge devices struggle to meet CNN's latency requirements while maintaining good accuracy. It is, therefore, crucial to choose the CNN with the best accuracy and latency trade-off while respecting hardware constraints. This paper presents and compares five of the widely used Machine Learning (ML) based approaches to predict CNN's inference execution time on Edge GPUs. For these 5 methods, in addition to their prediction accuracy, we also explore the time needed for their training and their hyperparameters' tuning. Finally, we compare times to run the prediction models on different platforms. The use of these methods will highly facilitate design space exploration by quickly providing the best CNN on a target Edge GPU. Experimental results show that XGBoost provides an interesting average prediction error even for unexplored and unseen CNN architectures. Random Forest depicts comparable accuracy but needs more effort and time to be trained. The other 3 approaches (OLS, MLP, and SVR) are less accurate for CNN performance estimation.

[1]  Ameet Talwalkar,et al.  Paleo: A Performance Model for Deep Neural Networks , 2016, ICLR.

[2]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[3]  José Antonio Lozano,et al.  Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Norman Matloff,et al.  Statistical Regression and Classification: From Linear Models to Machine Learning , 2017 .

[5]  Denis Trystram,et al.  A comparison of GPU execution time prediction using machine learning and analytical modeling , 2016, 2016 IEEE 15th International Symposium on Network Computing and Applications (NCA).

[6]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[7]  Diana Marculescu,et al.  HyperPower: Power- and memory-constrained hyper-parameter optimization for neural networks , 2017, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[8]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[9]  Xukan Ran,et al.  Deep Learning With Edge Computing: A Review , 2019, Proceedings of the IEEE.

[10]  Diana Marculescu,et al.  NeuralPower: Predict and Deploy Energy-Efficient Convolutional Neural Networks , 2017, ArXiv.

[11]  Dylan Malone Stuart,et al.  Memory Requirements for Convolutional Neural Network Hardware Accelerators , 2018, 2018 IEEE International Symposium on Workload Characterization (IISWC).

[12]  Hamza Ouarnoughi,et al.  A Comprehensive Survey on Hardware-Aware Neural Architecture Search , 2021, ArXiv.

[13]  A. Stephen McGough,et al.  Predicting the Computational Cost of Deep Learning Models , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[14]  Fionn Murtagh,et al.  Multilayer perceptrons for classification and regression , 1991, Neurocomputing.

[15]  Paolo Napoletano,et al.  Benchmark Analysis of Representative Deep Neural Network Architectures , 2018, IEEE Access.

[16]  Wei Lin,et al.  Characterizing Deep Learning Training Workloads on Alibaba-PAI , 2019, 2019 IEEE International Symposium on Workload Characterization (IISWC).