Towards an Effective Unified Programming Model for Many-Cores

Building an effective programming model for many-core processors is challenging. On the one hand, the increasing variety of platforms and their specific programming models force users to take a hardware-centric approach not only for implementing parallel applications, but also for designing them. This approach diminishes portability and, eventually, limits performance. On the other hand, to effectively cope with the increased number of large-scale workloads that require parallelization, a portable, application-centric programming model is desirable. Such a model enables programmers to focus first on extracting and exploiting parallelism from their applications, as opposed to generating parallelism for specific hardware, and only second on platform-specific implementation and optimizations. In this paper, we first present a survey of programming models designed for programming three families of many-cores: general purpose many-cores (GPMCs), graphics processing units (GPUs), and the Cell/B.E.. We analyze the usability of these models, their ability to improve platform programmability, and the specific features that contribute to this improvement. Next, we also discuss two types of generic models: parallelism-centric and application-centric. We also analyze their features and impact on platform programmability. Based on this analysis, we recommend two application-centric models (OmpSs and OpenCL) as promising candidates for a unified programming model for many-cores and we discuss potential enhancements for them.

[1]  Sean Rul,et al.  An experimental study on performance portability of OpenCL kernels , 2010, HiPC 2010.

[2]  Rosa M. Badia,et al.  CellSs: a Programming Model for the Cell BE Architecture , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[3]  Michael J. Flynn,et al.  Some Computer Organizations and Their Effectiveness , 1972, IEEE Transactions on Computers.

[4]  Alejandro Duran,et al.  Extending OpenMP to Survive the Heterogeneous Multi-Core Era , 2010, International Journal of Parallel Programming.

[5]  Anwar Ghuloum Future Proof Data Parallel Algorithms and Software on Intel Multicore Architecture , 2007 .

[6]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..

[7]  Laxmikant V. Kalé,et al.  Towards a framework for abstracting accelerators in parallel applications: experience with cell , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[8]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[9]  Ken Kennedy,et al.  The rise and fall of High Performance Fortran: an historical object lesson , 2007, HOPL.

[10]  Alejandro Duran,et al.  The Design of OpenMP Tasks , 2009, IEEE Transactions on Parallel and Distributed Systems.

[11]  Jack Dongarra,et al.  SCOP3: A Rough Guide to Scientific Computing On the PlayStation 3 , 2007 .

[12]  Alejandro Duran,et al.  Optimizing the Exploitation of Multicore Processors and GPUs with OpenMP and OpenCL , 2010, LCPC.

[13]  P. Hanrahan,et al.  Sequoia: Programming the Memory Hierarchy , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[14]  James Reinders,et al.  Intel® threading building blocks , 2008 .

[15]  Bradford L. Chamberlain,et al.  Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..

[16]  Jack Dongarra,et al.  OpenCL Evaluation for Numerical Linear Algebra Library Development , 2011 .

[17]  Vadim Sheinin,et al.  OpenCL and parallel primitives for digital TV applications , 2010, IBM J. Res. Dev..