Nested Parallelism with Algorithmic Skeletons

Nested parallelism is a natural way to express programs for hierarchical systems. It enables a compositional programming approach that can then be mapped onto the system hierarchy. In this paper, we present nested algorithm composition in the STAPL Skeleton Library (SSL) which uses a nested dataflow model as its internal representation. We show how a high level program specification using SSL allows for asynchronous computation and improved locality. We study both the specification and performance of the stapl implementation of Kripke, a mini-app developed by Lawrence Livermore National Laboratory. Kripke has multiple levels of parallelism and a number of data layouts, making it an excellent test bed to exercise the effectiveness of a nested parallel programming approach. Performance results are provided for six different nesting orders of the benchmark demonstrating the flexibility and performance of nested algorithmic skeleton composition in stapl.

[1]  Nancy M. Amato,et al.  The STAPL parallel container framework , 2011, PPoPP '11.

[2]  Nancy M. Amato,et al.  Asynchronous Nested Parallelism for Dynamic Applications in Distributed Memory , 2015, LCPC.

[3]  Nancy M. Amato,et al.  The stapl Skeleton Framework , 2014, LCPC.

[4]  Nancy M. Amato,et al.  The STAPL Parallel Graph Library , 2012, LCPC.

[5]  Jocelyn Sérot,et al.  Skeletons for parallel image processing: an overview of the SKIPPER project , 2002, Parallel Comput..

[6]  Alexander Aiken,et al.  Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[7]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[8]  Norman Scaife,et al.  NESTED ALGORITHMIC SKELETONS FROM HIGHER ORDER FUNCTIONS , 2001 .

[9]  Javier Jiménez,et al.  Hybrid OpenMP-MPI Turbulent Boundary Layer Code Over 32k Cores , 2011, EuroMPI.

[10]  Geppino Pucci,et al.  Universality in VLSI Computation , 2011, ParCo 2011.

[11]  Christoph W. Kessler,et al.  NestStep: Nested Parallelism and Virtual Shared Memory for the BSP Model , 2000, The Journal of Supercomputing.

[12]  David R. Musser,et al.  STL tutorial and reference guide, second edition: C++ programming with the standard template library , 2001 .

[13]  William N. Scherer,et al.  A new vision for coarray Fortran , 2009, PGAS '09.

[14]  Vivek Sarkar,et al.  Habanero-Java: the new adventures of old X10 , 2011, PPPJ.

[15]  Nancy M. Amato,et al.  STAPL: standard template adaptive parallel library , 2010, SYSTOR '10.

[16]  Nancy M. Amato,et al.  The STAPL pView , 2010, LCPC.

[17]  Franck Cappello,et al.  MPI versus MPI+OpenMP on the IBM SP for the NAS Benchmarks , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[18]  Nancy M. Amato,et al.  Composing Algorithmic Skeletons to Express High-Performance Scientific Applications , 2015, ICS.

[19]  Harsha Vardhan Simhadri,et al.  Extending the Nested Parallel Model to the Nested Dataflow Model with Provably Efficient Schedulers , 2016, SPAA.

[20]  Dietmar Fey,et al.  Using HPX and LibGeoDecomp for scaling HPC applications on heterogeneous supercomputers , 2013, ScalA '13.

[21]  Anne Benoit,et al.  Two Fundamental Concepts in Skeletal Parallel Programming , 2005, International Conference on Computational Science.

[22]  P. Hanrahan,et al.  Sequoia: Programming the Memory Hierarchy , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[23]  Victor Luchangco,et al.  Fortress (Sun HPCS Language) , 2011, Encyclopedia of Parallel Computing.

[24]  James Reinders,et al.  Intel® threading building blocks , 2008 .

[25]  Arch D. Robison,et al.  Composable Parallel Patterns with Intel Cilk Plus , 2013, Computing in Science & Engineering.