The PEPPHER Approach to Programmability and Performance Portability for Heterogeneous many-core Architectures

The European FP7 project PEPPHER is addressing programmability and performance portability for current and emerging heterogeneous many-core architectures. As its main idea, the project proposes a multi-level parallel execution model comprised of potentially parallelized components existing in variants suitable for different types of cores, memory configurations, input characteristics, optimization criteria, and couples this with dynamic and static resource and architecture aware scheduling mechanisms. Crucial to PEPPHER is that components can be made performance aware, allowing for more efficient dynamic and static scheduling on the concrete, available resources. The flexibility provided in the software model, combined with a customizable, heterogeneous, memory and topology aware run-time system is key to efficiently exploiting the resources of each concrete hardware configuration. The project takes a holistic approach, relying on existing paradigms, interfaces, and languages for the parallelization of components, and develops a prototype framework, a methodology for extending the framework, and guidelines for constructing performance portable software and systems-including paths to migration of existing software-for heterogeneous many-core processors. This paper gives a high-level project overview, and presents a specific example showing how the PEPPHER component variant model and resource-aware run-time system enable performance portability of a numerical kernel. © 2012 The authors and IOS Press. All rights reserved.

[1]  Jesper Larsson Träff,et al.  Work-stealing for mixed-mode parallelism by deterministic team-building , 2010, SPAA '11.

[2]  Emmanuel Agullo,et al.  QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[3]  Siegfried Benkner,et al.  Explicit Platform Descriptions for Heterogeneous Many-Core Architectures , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[4]  Teresa H. Y. Meng,et al.  Merge: a programming model for heterogeneous multi-core systems , 2008, ASPLOS.

[5]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[6]  Alan Edelman,et al.  PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.

[7]  Greg Stitt,et al.  Elastic computing: a framework for transparent, portable, and adaptive multi-core heterogeneous computing , 2010, LCTES '10.

[8]  Fatos Xhafa,et al.  Towards an Intelligent Environment for Programming Multi-core Computing Systems , 2009, Euro-Par Workshops.

[9]  Andrew Richards,et al.  Offload - Automating Code Migration to Heterogeneous Multicore Systems , 2010, HiPEAC.

[10]  Jack J. Dongarra,et al.  Dense linear algebra solvers for multicore with GPU accelerators , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[11]  Vitaly Osipov,et al.  GPU sample sort , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[12]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[13]  Christoph W. Kessler,et al.  A Framework for Performance-Aware Composition of Explicitly Parallel Components , 2007, PARCO.

[14]  Philippas Tsigas,et al.  NB-FEB: A Universal Scalable Easy-to-Use Synchronization Primitive for Manycore Architectures , 2009, OPODIS.

[15]  Kunle Olukotun,et al.  A domain-specific approach to heterogeneous parallelism , 2011, PPoPP '11.

[16]  Yolanda Gil,et al.  Self-Configuring Applications for Heterogeneous Systems: Program Composition and Optimization Using Cognitive Techniques , 2008, Proceedings of the IEEE.

[17]  Julien Langou,et al.  A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..