Performance-Portable Benchmarking Methods for Investigating Heterogeneous Computing Platforms

[1]  Jack J. Dongarra,et al.  From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming , 2012, Parallel Comput..

[2]  Stephen A. Jarvis,et al.  An investigation of the performance portability of OpenCL , 2013, J. Parallel Distributed Comput..

[3]  James C. Hoe,et al.  Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[4]  Jianbin Fang,et al.  A Comprehensive Performance Comparison of CUDA and OpenCL , 2011, 2011 International Conference on Parallel Processing.

[5]  Christoforos E. Kozyrakis,et al.  Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.

[6]  Collin McCurdy,et al.  The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.

[7]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[8]  Scott Pakin,et al.  Entering the petaflop era: the architecture and performance of Roadrunner , 2008, HiPC 2008.

[9]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[10]  Jack J. Dongarra,et al.  A Note on Auto-tuning GEMM for GPUs , 2009, ICCS.

[11]  Jack J. Dongarra,et al.  LINPACK Benchmark , 2011, Encyclopedia of Parallel Computing.

[12]  Wen-mei W. Hwu,et al.  Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .

[13]  Dale R. Shires,et al.  Investigation of Parallel Programmability and Performance of a Calxeda ARM Server Using OpenCL , 2013, Euro-Par Workshops.

[14]  Jack J. Dongarra,et al.  Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..

[15]  Dale R. Shires,et al.  Ray-Tracing-Based Geospatial Optimization for Heterogeneous Architectures Enhancing Situational Awareness , 2013, 2013 IEEE 16th International Conference on Computational Science and Engineering.

[16]  Simon McIntosh-Smith,et al.  On the Performance Portability of Structured Grid Codes on Many-Core Computer Architectures , 2014, ISC.

[17]  Walid A. Abu-Sufah,et al.  Auto-tuning of Sparse Matrix-Vector Multiplication on Graphics Processors , 2013, ISC.

[18]  Jing Zhang,et al.  OpenCL and the 13 dwarfs: a work in progress , 2012, ICPE '12.

[19]  Saman P. Amarasinghe,et al.  Portable performance on heterogeneous architectures , 2013, ASPLOS '13.