Performance-Portable Benchmarking Methods for Investigating Heterogeneous Computing Platforms
暂无分享,去创建一个
Dale R. Shires | James A. Ross | David A. Richie | Jamie Infantolino | Thomas M. Kendall | Song J. Park
[1] Jack J. Dongarra,et al. From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming , 2012, Parallel Comput..
[2] Stephen A. Jarvis,et al. An investigation of the performance portability of OpenCL , 2013, J. Parallel Distributed Comput..
[3] James C. Hoe,et al. Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[4] Jianbin Fang,et al. A Comprehensive Performance Comparison of CUDA and OpenCL , 2011, 2011 International Conference on Parallel Processing.
[5] Christoforos E. Kozyrakis,et al. Understanding sources of inefficiency in general-purpose chips , 2010, ISCA.
[6] Collin McCurdy,et al. The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.
[7] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[8] Scott Pakin,et al. Entering the petaflop era: the architecture and performance of Roadrunner , 2008, HiPC 2008.
[9] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .
[10] Jack J. Dongarra,et al. A Note on Auto-tuning GEMM for GPUs , 2009, ICCS.
[11] Jack J. Dongarra,et al. LINPACK Benchmark , 2011, Encyclopedia of Parallel Computing.
[12] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[13] Dale R. Shires,et al. Investigation of Parallel Programmability and Performance of a Calxeda ARM Server Using OpenCL , 2013, Euro-Par Workshops.
[14] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[15] Dale R. Shires,et al. Ray-Tracing-Based Geospatial Optimization for Heterogeneous Architectures Enhancing Situational Awareness , 2013, 2013 IEEE 16th International Conference on Computational Science and Engineering.
[16] Simon McIntosh-Smith,et al. On the Performance Portability of Structured Grid Codes on Many-Core Computer Architectures , 2014, ISC.
[17] Walid A. Abu-Sufah,et al. Auto-tuning of Sparse Matrix-Vector Multiplication on Graphics Processors , 2013, ISC.
[18] Jing Zhang,et al. OpenCL and the 13 dwarfs: a work in progress , 2012, ICPE '12.
[19] Saman P. Amarasinghe,et al. Portable performance on heterogeneous architectures , 2013, ASPLOS '13.