Optimizing Techniques for OpenCL Programs on Heterogeneous Platforms

Heterogeneous platforms that are consisted of CPU and add-on streaming processors are widely used in modern computer systems. These add-on processors provide substantially more computation capability and memory bandwidth than conventional multi-cores platforms. General-purpose computations can also be leveraged onto these add-on processors. In order to utilize their potential performance, programming these streaming processors is challenging because of their diverse underlying architectural characteristics. Several optimization techniques are applied on OpenCL-compatible heterogeneous platforms to achieve thread-level, data-level, and instruction-level parallelism. The architectural implications of these techniques and optimization principles are discussed. Finally, a case study of MRI-Q benchmark will be addressed to illustrate to capabilities of these optimization techniques. The experimental results reveal the speedup from non-optimized to optimized kernel can vary from 8 to 63 on different target platforms.

[1]  Nagavijayalakshmi Vydyanathan,et al.  Parallel discrete wavelet transform using the Open Computing Language: a performance and portability study , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[2]  Haoran Yi,et al.  How GPUs Can Improve the Quality of Magnetic Resonance Imaging , 2011 .

[3]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[4]  Jens Breitbart,et al.  OpenCL - An effective programming model for data parallel computations at the Cell Broadband Engine , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[5]  Akila Gothandaraman,et al.  Comparing Hardware Accelerators in Scientific Applications: A Case Study , 2011, IEEE Transactions on Parallel and Distributed Systems.

[6]  Bixia Zheng,et al.  Twin Peaks: A Software Platform for Heterogeneous Computing on General-Purpose and Graphics Processors , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[7]  Jong-Deok Choi,et al.  An OpenCL framework for heterogeneous multicores with local memory , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[8]  Kevin Skadron,et al.  A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..