OCL-BodyScan: A Case Study for Application-centric Programming of Many-Core Processors

Application development for many-core processors is predominately hardware-centric: programmers design, implement, and optimize applications for a pre-chosen target platform. While this approach may deliver very good performance, it lacks portability, being inefficient for applications that aim to use multiple architectures or large-scale parallel platforms with heterogeneous many-core nodes. In this work, we focus on application portability. Therefore, we propose an application-centric approach for developing parallel workloads for many-cores, and we make use of OpenCL to preserve portability until the very last optimization stages. We validate our application-centric approach using 3D body scan, a data intensive application with soft real-time constraints. Thus, we design and implement OCL-body scan (the portable OpenCL-based version of 3D Body scan), and we evaluate its performance on three families of platforms - general purpose multi-cores, graphical processing units, and the Cell/B.E.. Our experiments show that our application-centric strategy enables portability and leads to good performance results. Additionally, typical platform-specific optimizations can be applied in the final implementation stages, leading to performance results similar to those obtained using the native tool-chains.

[1]  Alejandro Duran,et al.  Extending OpenMP to Survive the Heterogeneous Multi-Core Era , 2010, International Journal of Parallel Programming.

[2]  Jong-Deok Choi,et al.  An OpenCL framework for heterogeneous multicores with local memory , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[3]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[4]  M. Skolnik,et al.  Introduction to Radar Systems , 2021, Advances in Adaptive Radar Detection and Range Estimation.

[5]  Jack Dongarra,et al.  OpenCL Evaluation for Numerical Linear Algebra Library Development , 2011 .

[6]  Vadim Sheinin,et al.  OpenCL and parallel primitives for digital TV applications , 2010, IBM J. Res. Dev..

[7]  John C. Curlander,et al.  Synthetic Aperture Radar: Systems and Signal Processing , 1991 .

[8]  Sean Rul,et al.  An experimental study on performance portability of OpenCL kernels , 2010, HiPC 2010.

[9]  Alejandro Duran,et al.  The Design of OpenMP Tasks , 2009, IEEE Transactions on Parallel and Distributed Systems.

[10]  Maarten Ditzel,et al.  Real-time brute force SAR processing , 2009, 2009 IEEE Radar Conference.

[11]  M. Raskovic,et al.  An OpenCL-based Solution for Portable Bodyscan SAR Processing on Multicore Platforms , 2010 .