论文信息 - PEPPHER: Efficient and Productive Usage of Hybrid Computing Systems

PEPPHER: Efficient and Productive Usage of Hybrid Computing Systems

PEPPHER, a three-year European FP7 project, addresses efficient utilization of hybrid (heterogeneous) computer systems consisting of multicore CPUs with GPU-type accelerators. This article outlines the PEPPHER performance-aware component model, performance prediction means, runtime system, and other aspects of the project. A larger example demonstrates performance portability with the PEPPHER approach across hybrid systems with one to four GPUs.

[1] Philippas Tsigas,et al. Cache-Aware Lock-Free Queues for Multiple Producers/Consumers and Weak Memory Consistency , 2010, OPODIS.

[2] Emmanuel Agullo,et al. QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[3] Philippas Tsigas,et al. NB-FEB: A Universal Scalable Easy-to-Use Synchronization Primitive for Manycore Architectures , 2009, OPODIS.

[4] Andrew Richards,et al. Offload - Automating Code Migration to Heterogeneous Multicore Systems , 2010, HiPEAC.

[5] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[6] Yolanda Gil,et al. Self-Configuring Applications for Heterogeneous Systems: Program Composition and Optimization Using Cognitive Techniques , 2008, Proceedings of the IEEE.

[7] Peter Sanders,et al. MCSTL: the multi-core standard template library , 2007, PPOPP.

[8] Teresa H. Y. Meng,et al. Merge: a programming model for heterogeneous multi-core systems , 2008, ASPLOS.

[9] Jesper Larsson Träff,et al. Work-stealing for mixed-mode parallelism by deterministic team-building , 2010, SPAA '11.

[10] Nancy M. Amato,et al. A framework for adaptive algorithm selection in STAPL , 2005, PPoPP.

[11] Kunle Olukotun,et al. A domain-specific approach to heterogeneous parallelism , 2011, PPoPP '11.

[12] Christoph W. Kessler,et al. A Framework for Performance-Aware Composition of Explicitly Parallel Components , 2007, PARCO.

[13] Vitaly Osipov,et al. GPU sample sort , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[14] Salim Hariri,et al. Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..

[15] Jack J. Dongarra,et al. Dense linear algebra solvers for multicore with GPU accelerators , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[16] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..

[17] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[18] Alan Edelman,et al. PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.

[19] Greg Stitt,et al. Elastic computing: a framework for transparent, portable, and adaptive multi-core heterogeneous computing , 2010, LCTES '10.

[20] Christoph W. Kessler,et al. Auto-tuning SkePU: a multi-backend skeleton programming framework for multi-GPU systems , 2011, IWMSE '11.

[21] P. Hanrahan,et al. Sequoia: Programming the Memory Hierarchy , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[22] Siegfried Benkner,et al. Explicit Platform Descriptions for Heterogeneous Many-Core Architectures , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.