Performance Prediction Model and Analysis for Compute-Intensive Tasks on GPUs

Using Graphics Processing Units (GPUs) to solve general purpose problems has received significant attention both in academia and industry. Harnessing the power of these devices however requires knowledge of the underlying architecture and the programming model. In this paper, we develop analytical models to predict the performance of GPUs for computationally intensive tasks. Our models are based on varying the relevant parameters - including total number of threads, number of blocks, and number of streaming multi-processors - and predicting the performance of a program for a specified instance of these parameters. The approach can be used in the context of heterogeneous environments where distinct types of GPU devices with different hardware configurations are employed.

[1]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[2]  Wen-mei W. Hwu,et al.  Program optimization carving for GPU computing , 2008, J. Parallel Distributed Comput..

[3]  Jian Pei,et al.  A practical method for estimating performance degradation on multicore processors, and its application to HPC workloads , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[4]  Weiguo Liu,et al.  Performance Predictions for General-Purpose Computation on GPUs , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[5]  David L. Foster,et al.  Accelerating Single Iteration Performance of CUDA-Based 3D Reaction–Diffusion Simulations , 2013, International Journal of Parallel Programming.

[6]  Peter A. Dinda Online prediction of the running time of tasks , 2001, SIGMETRICS '01.

[7]  John K. Antonio,et al.  Data Structures and Algorithms for Counting Problems on Graphs using GPU , 2013, Int. J. Netw. Comput..

[8]  Richard Wolski,et al.  Predicting the CPU availability of time‐shared Unix systems on the computational grid , 2004, Cluster Computing.

[9]  Sudhakar Yalamanchili,et al.  A characterization and analysis of PTX kernels , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[10]  Yao Zhang,et al.  A quantitative performance analysis model for GPU architectures , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[11]  M. Al-Mouhamed,et al.  Exploration of automatic optimization for CUDA programming , 2012, 2012 2nd IEEE International Conference on Parallel, Distributed and Grid Computing.

[12]  David E. Culler,et al.  Analysis of multithreaded architectures for parallel computing , 1990, SPAA '90.

[13]  Hyesoon Kim,et al.  An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.

[14]  John K. Antonio,et al.  On Analyzing Large Graphs Using GPUs , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[15]  Vinay G. Vaidya,et al.  Optimal task scheduler for multi-core processor , 2010, 2010 2nd International Conference on Software Technology and Engineering.

[16]  William Gropp,et al.  An adaptive performance modeling tool for GPU architectures , 2010, PPoPP '10.

[18]  Richard W. Vuduc,et al.  A performance analysis framework for identifying potential benefits in GPGPU applications , 2012, PPoPP '12.

[19]  John K. Antonio,et al.  Counting Problems on Graphs: GPU Storage and Parallel Computing Techniques , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[20]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[22]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[23]  John K. Antonio,et al.  Cost-Minimizing Scheduling of Workflows on a Cloud of Memory Managed Multicore Machines , 2009, CloudCom.