A History-Based Auto-Tuning Framework for Fast and High-Performance DNN Design on GPU

While Deep Neural Networks (DNNs) are becoming increasingly popular, there is a growing trend to accelerate the DNN applications on hardware platforms like GPUs, FPGAs, etc., to gain higher performance and efficiency. However, it is time-consuming to tune the performance for such platforms due to the large design space and the expensive cost to evaluate each design point. Although many tuning algorithms, such as XGBoost tuner and genetic algorithm (GA) tuner, have been proposed to guide the design space exploring process in the previous work, the timing issue still remains a critical problem. In this work, we propose a novel auto-tuning framework to optimize the DNN operator design on GPU by leveraging the tuning history efficiently in different scenarios. Our experiments show that we can achieve superior performance than the state-of-the-art work, such as auto-tuning framework TVM and the handcraft optimized library cuDNN, while reducing the searching time by 8.96x and 4.58x comparing with XGBoost tuner and GA tuner in TVM.

[1]  B. Roe,et al.  Boosted decision trees as an alternative to artificial neural networks for particle identification , 2004, physics/0408124.

[2]  J. Viers,et al.  Hydrologic Variability of the Cosumnes River Floodplain , 2006 .

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Yi Yang,et al.  Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[7]  Jaejin Lee,et al.  Performance analysis of CNN frameworks for GPUs , 2017, 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[8]  Xiaoming Chen,et al.  Optimizing memory efficiency for convolution kernels on kepler GPUs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[9]  Song Han,et al.  AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[10]  Wei Liu,et al.  PocketFlow: An Automated Framework for Compressing and Accelerating Deep Neural Networks , 2018 .

[11]  Haichen Shen,et al.  TVM: An Automated End-to-End Optimizing Compiler for Deep Learning , 2018, OSDI.

[12]  Hadi Esmaeilzadeh,et al.  Reinforcement Learning and Adaptive Sampling for Optimized DNN Compilation , 2019, ArXiv.

[13]  Yida Wang,et al.  A Unified Optimization Approach for CNN Model Inference on Integrated GPUs , 2019, ICPP.

[14]  Yida Wang,et al.  Optimizing CNN Model Inference on CPUs , 2018, USENIX Annual Technical Conference.