Sequoia: Programming the Memory Hierarchy

We present Sequoia, a programming language designed to facilitate the development of memory hierarchy aware parallel programs that remain portable across modern machines featuring different memory hierarchy configurations. Sequoia abstractly exposes hierarchical memory in the programming model and provides language mechanisms to describe communication vertically through the machine and to localize computation to particular memory locations within it. We have implemented a complete programming system, including a compiler and runtime systems for cell processor-based blade systems and distributed memory clusters, and demonstrate efficient performance running Sequoia programs on both of these platforms

[1]  Victor Luchangco,et al.  The Fortress Language Specification Version 1.0 , 2007 .

[2]  David A. Padua,et al.  Programming for parallelism and locality with hierarchically tiled arrays , 2006, PPoPP '06.

[3]  Pat Hanrahan,et al.  ClawHMMER: A Streaming HMMer-Search Implementatio , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[4]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[5]  S. Asano,et al.  The design and implementation of a first-generation CELL processor , 2005, ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference, 2005..

[6]  Volker Strumpen,et al.  Cache oblivious stencil computations , 2005, ICS '05.

[7]  J. Makino,et al.  GRAPE-6A: A Single-Card GRAPE-6 for Parallel PC-GRAPE Cluster Systems , 2005, astro-ph/0504407.

[8]  Pat Hanrahan,et al.  ClawHMMER: A Streaming HMMer-Search Implementation , 2005, SC.

[9]  Bowen Alpern,et al.  The uniform memory hierarchy model of computation , 2005, Algorithmica.

[10]  M. Horowitz,et al.  The stream virtual machine , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[11]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..

[12]  Bradford L. Chamberlain,et al.  The cascade high productivity language , 2004, Ninth International Workshop on High-Level Parallel Programming Models and Supportive Environments, 2004. Proceedings..

[13]  Steven J. Deitz,et al.  Abstractions for dynamic data distribution , 2004, Ninth International Workshop on High-Level Parallel Programming Models and Supportive Environments, 2004. Proceedings..

[14]  Matteo Frigo,et al.  A fast Fourier transform compiler , 1999, SIGP.

[15]  Jung Ho Ahn,et al.  Merrimac: Supercomputing with Streams , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[16]  William J. Dally,et al.  The Imagine Stream Processor , 2002, Proceedings. IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[17]  Peter Mattson,et al.  A programming system for the imagine media processor , 2002 .

[18]  Ken Kennedy,et al.  Telescoping Languages: A Strategy for Automatic Generation of Scientific Problem-Solving Systems from Annotated Libraries , 2001, J. Parallel Distributed Comput..

[19]  Monica S. Lam,et al.  Blocking and array contraction across arbitrarily nested loops using affine partitioning , 2001, PPoPP '01.

[20]  Osman Yasar,et al.  New trends in high performance computing , 2001, Parallel Comput..

[21]  Jack J. Dongarra,et al.  Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..

[22]  Katherine Yelick,et al.  Introduction to UPC and Language Specification , 2000 .

[23]  Calvin Lin,et al.  An annotation language for optimizing software libraries , 1999, DSL '99.

[24]  Matteo Frigo,et al.  Cache-oblivious algorithms , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[25]  Katherine A. Yelick,et al.  Titanium: A High-performance Java Dialect , 1998, Concurr. Pract. Exp..

[26]  Robert W. Numrich,et al.  Co-array Fortran for parallel programming , 1998, FORF.

[27]  Jeffrey Scott Vitter,et al.  External memory algorithms , 1998, ESA.

[28]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[29]  Fred G. Gustavson,et al.  Recursion leads to automatic variable blocking for dense linear-algebra algorithms , 1997, IBM J. Res. Dev..

[30]  Bowen Alpern,et al.  Space-limited procedures: a methodology for portable high-performance , 1995, Programming Models for Massively Parallel Computers.

[31]  Bradley C. Kuszmaul,et al.  Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.

[32]  D. Culler,et al.  Parallel programming in Split-C , 1993, Supercomputing '93. Proceedings.

[33]  Bowen Alpern,et al.  Modeling parallel computers as memory hierarchies , 1993, Proceedings of Workshop on Programming Models for Massively Parallel Computers.

[34]  David Notkin,et al.  Program Structuring for Effective Parallel Portability , 1993, IEEE Trans. Parallel Distributed Syst..

[35]  John Zahorjan,et al.  Chores: enhanced run-time support for shared-memory parallel computing , 1993, TOCS.

[36]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[37]  H. T. Kung,et al.  I/O complexity: The red-blue pebble game , 1981, STOC '81.