Programming the memory hierarchy revisited: supporting irregular parallelism in sequoia

We describe two novel constructs for programming parallel machines with multi-level memory hierarchies: call-up, which allows a child task to invoke computation on its parent, and spawn, which spawns a dynamically determined number of parallel children until some termination condition in the parent is met. Together we show that these constructs allow applications with irregular parallelism to be programmed in a straightforward manner, and furthermore these constructs complement and can be combined with constructs for expressing regular parallelism. We have implemented spawn and call-up in Sequoia and we present an experimental evaluation on a number of irregular applications.

[1]  H. T. Kung,et al.  I/O complexity: The red-blue pebble game , 1981, STOC '81.

[2]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[3]  T. von Eicken,et al.  Parallel programming in Split-C , 1993, Supercomputing '93.

[4]  Andrea C. Arpaci-Dusseau,et al.  Parallel programming in Split-C , 1993, Supercomputing '93. Proceedings.

[5]  Bowen Alpern,et al.  Modeling parallel computers as memory hierarchies , 1993, Proceedings of Workshop on Programming Models for Massively Parallel Computers.

[6]  Bowen Alpern,et al.  Space-limited procedures: a methodology for portable high-performance , 1995, Programming Models for Massively Parallel Computers.

[7]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[8]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[9]  Jeffrey Scott Vitter External memory algorithms , 1998, PODS '98.

[10]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[11]  Katherine A. Yelick,et al.  Titanium: A High-performance Java Dialect , 1998, Concurr. Pract. Exp..

[12]  Robert W. Numrich,et al.  Co-array Fortran for parallel programming , 1998, FORF.

[13]  Barry Wilkinson,et al.  Parallel programming , 1998 .

[14]  Katherine Yelick,et al.  Introduction to UPC and Language Specification , 2000 .

[15]  Panos M. Pardalos,et al.  Handbook of Massive Data Sets , 2002, Massive Computing.

[16]  Peter Mattson,et al.  A programming system for the imagine media processor , 2002 .

[17]  Niklas Sörensson,et al.  An Extensible SAT-solver , 2003, SAT.

[18]  Steven J. Deitz,et al.  Abstractions for dynamic data distribution , 2004, Ninth International Workshop on High-Level Parallel Programming Models and Supportive Environments, 2004. Proceedings..

[19]  M. Horowitz,et al.  The stream virtual machine , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[20]  Bradford L. Chamberlain,et al.  The cascade high productivity language , 2004, Ninth International Workshop on High-Level Parallel Programming Models and Supportive Environments, 2004. Proceedings..

[21]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..

[22]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[23]  Katherine Yelick,et al.  OSKI: A library of automatically tuned sparse matrix kernels , 2005 .

[24]  Bowen Alpern,et al.  The uniform memory hierarchy model of computation , 2005, Algorithmica.

[25]  P. Hanrahan,et al.  Sequoia: Programming the Memory Hierarchy , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[26]  David A. Padua,et al.  Programming for parallelism and locality with hierarchically tiled arrays , 2006, PPoPP '06.

[27]  William J. Dally,et al.  Compilation for explicitly managed memory hierarchies , 2007, PPOPP.

[28]  Victor Luchangco,et al.  The Fortress Language Specification Version 1.0 , 2007 .

[29]  Michael Lang,et al.  Entering the petaflop era: The architecture and performance of Roadrunner , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[30]  William J. Dally,et al.  A portable runtime interface for multi-level memory hierarchies , 2008, PPoPP.

[31]  Rudolf Eigenmann,et al.  Adaptive runtime tuning of parallel sparse matrix-vector multiplication on distributed memory systems , 2008, ICS '08.

[32]  William J. Dally,et al.  A tuning framework for software-managed memory hierarchies , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[33]  John R. Gilbert,et al.  Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks , 2009, SPAA '09.

[34]  Vivek Sarkar,et al.  Hierarchical Place Trees: A Portable Abstraction for Task Parallelism and Data Movement , 2009, LCPC.

[35]  Lakhdar Sais,et al.  ManySAT: a Parallel SAT Solver , 2009, J. Satisf. Boolean Model. Comput..

[36]  Vitaly Osipov,et al.  GPU sample sort , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[37]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.