Capturing and Composing Parallel Patterns with Intel CnC Ryan Newton Frank Schlimbach Mark Hampton Kathleen Knobe Intel

The most accessible and successful parallel tools today are those that ask programmers to write only isolated serial kernels, hiding parallelism behind a library interface. Examples include Google’s Map-Reduce [5], CUDA [13], and STAPL [12]. This encapsulation approach applies to a wide range of structured, well-understood algorithms, which we call parallel patterns. Today’s highlevel systems tend to encapsulate only a single pattern. Thus we explore the use of Intel CnC as a single framework for capturing and composing multiple patterns.

[1]  Philip Wadler,et al.  Linear Types can Change the World! , 1990, Programming Concepts and Methods.

[2]  Guy E. Blelloch,et al.  Provably efficient scheduling for languages with fine-grained parallelism , 1995, SPAA '95.

[3]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[4]  Lawrence Rauchwerger,et al.  Standard Templates Adaptive Parallel Library (STAPL) , 1998, LCR.

[5]  Hans-Wolfgang Loidl,et al.  Algorithm + strategy = parallelism , 1998, Journal of Functional Programming.

[6]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[7]  Jean-Thierry Lapresté,et al.  Quaff: efficient C++ design for parallel skeletons , 2006, Parallel Comput..

[8]  James Reinders,et al.  Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .

[9]  John H. Reppy,et al.  A scheduling framework for general-purpose parallel languages , 2008, ICFP.

[10]  Herbert Kuchen,et al.  The Münster Skeleton Library Muesli: A comprehensive overview , 2009 .

[11]  Benjamin Hindman,et al.  Lithe: enabling efficient composition of parallel libraries , 2009 .

[12]  J. Kulpa,et al.  Time-frequency analysis using NVIDIA compute unified device architecture (CUDA) , 2009, Symposium on Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments (WILGA).

[13]  Andrew Lumsdaine,et al.  PFunc: modern task parallelism for modern high performance computing , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.