GPU Performance and Power Tuning Using Regression Trees

GPU performance and power tuning is difficult, requiring extensive user expertise and time-consuming trial and error. To accelerate design tuning, statistical design space exploration methods have been proposed. This article presents Starchart, a novel design space partitioning tool that uses regression trees to approach GPU tuning problems. Improving on prior work, Starchart offers more automation in identifying key design trade-offs and models design subspaces with distinctly different behaviors. Starchart achieves good model accuracy using very few random samples: less than 0.3% of a given design space; iterative sampling can more quickly target subspaces of interest.

[1]  Kapil Vaswani,et al.  Construction and use of linear regression models for processor performance analysis , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[2]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[3]  Bin Li,et al.  Tree structured analysis on GPU power study , 2011, 2011 IEEE 29th International Conference on Computer Design (ICCD).

[4]  David I. August,et al.  Compiler optimization-space exploration , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[5]  Hyesoon Kim,et al.  An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.

[6]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[7]  Samuel Williams,et al.  Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[8]  Grey Giddins,et al.  Statistics , 2016, The Journal of hand surgery, European volume.

[9]  L. Leemis Applied Linear Regression Models , 1991 .

[10]  Hyesoon Kim,et al.  An integrated GPU power and performance model , 2010, ISCA.

[11]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[12]  Walter D. Fisher On Grouping for Maximum Homogeneity , 1958 .

[13]  David D. Cox,et al.  Machine learning for predictive auto-tuning with boosted regression trees , 2012, 2012 Innovative Parallel Computing (InPar).

[14]  Shoaib Kamil,et al.  OpenTuner: An extensible framework for program autotuning , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).

[15]  Naga K. Govindaraju,et al.  Auto-tuning of fast fourier transform on graphics processors , 2011, PPoPP '11.

[16]  Yuri Torres,et al.  Understanding the impact of CUDA tuning techniques for Fermi , 2011, 2011 International Conference on High Performance Computing & Simulation.

[17]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[18]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[19]  Richard W. Vuduc,et al.  Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.

[20]  Steven Salzberg,et al.  Decision Tree Induction: How Effective is the Greedy Heuristic? , 1995, KDD.

[21]  Margaret Martonosi,et al.  Stargazer: Automated regression-based GPU design space exploration , 2012, 2012 IEEE International Symposium on Performance Analysis of Systems & Software.

[22]  Tom R. Halfhill NVIDIA's Next-Generation CUDA Compute and Graphics Architecture, Code-Named Fermi, Adds Muscle for Parallel Processing , 2009 .

[23]  David M. Brooks,et al.  Accurate and efficient regression modeling for microarchitectural performance and power prediction , 2006, ASPLOS XII.

[24]  Olivier Temam,et al.  Collective optimization: A practical collaborative approach , 2010, TACO.

[25]  Wei-Yin Loh,et al.  Classification and Regression Tree Methods , 2008 .

[26]  Wen-mei W. Hwu,et al.  Program optimization space pruning for a multithreaded gpu , 2008, CGO '08.

[27]  TUNING CUDA APPLICATIONS FOR KEPLER , 2017 .

[28]  Jack J. Dongarra,et al.  A Note on Auto-tuning GEMM for GPUs , 2009, ICCS.

[29]  Satoshi Matsuoka,et al.  Statistical power modeling of GPU kernels using performance counters , 2010, International Conference on Green Computing.

[30]  Richard W. Vuduc,et al.  A performance analysis framework for identifying potential benefits in GPGPU applications , 2012, PPoPP '12.

[31]  Archana Ganapathi,et al.  A case for machine learning to optimize multicore performance , 2009 .

[32]  Margaret Martonosi,et al.  Starchart: Hardware and software optimization using recursive partitioning regression trees , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.

[33]  Michael F. P. O'Boyle,et al.  A large-scale cross-architecture evaluation of thread-coarsening , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).