Empirical performance modeling of GPU kernels using active learning

We focus on a design-of-experiments methodology for developing empirical performance models of GPU kernels. Recently, we developed an iterative active learning algorithm that adaptively selects parameter configurations in batches for concurrent evaluation on CPU architectures in order to build performance models over the parameter space. In this paper, we illustrate the adoption of the algorithm when concurrent evaluations are not possible, which is particularly useful in the absence of GPU clusters. We present an empirical study of the algorithm on a diverse set of GPU kernels and hardware. We show that even when concurrent evaluations are not possible, the default batch mode of the algorithm yields better models and the iterative active learning algorithm reduces the overall time required to obtain high-quality empirical performance models for GPU kernels.

[1]  Stanislav G. Sedukhin,et al.  Implementing a Code Generator for Fast Matrix Multiplication in OpenCL on the GPU , 2012, 2012 IEEE 6th International Symposium on Embedded Multicore SoCs.

[2]  William N. Venables,et al.  Modern Applied Statistics with S-Plus. , 1996 .

[3]  Ananta Tiwari,et al.  Modeling Power and Energy Usage of HPC Kernels , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[4]  Jack J. Dongarra,et al.  Autotuning GEMM Kernels for the Fermi GPU , 2012, IEEE Transactions on Parallel and Distributed Systems.

[5]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.

[6]  J. R. Koehler,et al.  Modern Applied Statistics with S-Plus. , 1996 .

[7]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[8]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[9]  Edward I. George,et al.  Bayesian Treed Models , 2002, Machine Learning.

[10]  Robert B. Gramacy,et al.  Dynamic Trees for Learning and Design , 2009, 0912.1586.

[11]  C. Pipper,et al.  [''R"--project for statistical computing]. , 2008, Ugeskrift for laeger.

[12]  Prasanna Balaprakash,et al.  Active-learning-based surrogate models for empirical performance tuning , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[13]  Chin-Teng Lin,et al.  Towards Performance-Portable, Scalable, and Convenient Linear Algebra , 2013, HotPar.

[14]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .