The design, implementation and evaluation of Jade: a portable, implicitly parallel programming language

Over the last decade, research in parallel computer architecture has led to the development of many new parallel machines. These machines have the potential to dramatically increase the resources available for solving important computational problems. The widespread use of these machines, however, has been limited by the difficulty of developing useful parallel software. This thesis presents the design, implementation and evaluation of Jade, a new programming language for parallel computations that exploit task-level concurrency. Jade is structured as a set of constructs that programmers use to specify how a program written in a standard sequential, imperative language accesses data. The implementation dynamically analyzes these specifications to automatically extract the concurrency and map the computation onto the parallel machine. The resulting parallel execution preserves the semantics of the original serial program. We have implemented Jade on a wide variety of parallel computing platforms: shared-memory multiprocessors such as the Stanford DASH machine, homogeneous message-passing machines such as the Intel iPSC/860, and on heterogeneous networks of workstations. Jade programs port without modification between all of these platforms. We evaluate the design and implementation of Jade by parallelizing several complete scientific and engineering applications in Jade and executing these applications on several computational platforms. We analyze how well Jade supports the process of developing these applications and present results that characterize how well they perform.

[1]  Yoichi Muraoka,et al.  On the Number of Operations Simultaneously Executable in Fortran-Like Programs and Their Resulting Speedup , 1972, IEEE Transactions on Computers.

[2]  C. A. R. Hoare,et al.  Monitors: an operating system structuring concept , 1974, CACM.

[3]  Per Brinch Hansen,et al.  The programming language Concurrent Pascal , 1975, IEEE Transactions on Software Engineering.

[4]  Per Brinch Hansen,et al.  The Architecture of Concurrent Programs , 1977 .

[5]  C. A. R. Hoare,et al.  Communicating sequential processes , 1978, CACM.

[6]  Butler W. Lampson,et al.  Experience with processes and monitors in Mesa , 1980, CACM.

[7]  David A. Padua,et al.  Dependence graphs and compiler optimizations , 1981, POPL '81.

[8]  Gene H. Golub,et al.  Matrix computations , 1983 .

[9]  Inmos Corp,et al.  Occam Programming Manual , 1984 .

[10]  Paul Hudak,et al.  The aggregate update problem in functional programming systems , 1985, POPL.

[11]  David R. Jefferson,et al.  Virtual time , 1985, ICPP.

[12]  David Gelernter,et al.  Generative communication in Linda , 1985, TOPL.

[13]  Robert H. Halstead,et al.  MULTILISP: a language for concurrent symbolic computation , 1985, TOPL.

[14]  Michael Goldsmith,et al.  Programming in occam 2 , 1985, Prentice Hall international series in computer science.

[15]  Henry G. Dietz,et al.  Refined Fortran: Another Sequential Language for Parallel Programming , 1986, ICPP.

[16]  Kai Li,et al.  Shared virtual memory on loosely coupled multiprocessors , 1986 .

[17]  David A. Fisher,et al.  Parallel Processing in Ada , 1986, Computer.

[18]  Andrew P. Black,et al.  Object structure in the Emerald system , 1986, OOPLSA '86.

[19]  Akinori Yonezawa,et al.  Object-oriented concurrent programming ABCL/1 , 1986, OOPLSA '86.

[20]  Vivek Sarkar,et al.  Compile-time partitioning and scheduling of parallel programs , 1986, SIGPLAN '86.

[21]  Gul Agha,et al.  Concurrent programming using actors , 1987 .

[22]  Pierre Jouvelot,et al.  FX-87 reference manual. Edition 1. 0. Technical report , 1987 .

[23]  Joannes M. Lucassen Types and Effects Towards the Integration of Functional and Imperative Programming. , 1987 .

[24]  Phil Hontalas,et al.  Distributed Simulation and the Time Wrap Operating System. , 1987, SOSP 1987.

[25]  Steve Gregory,et al.  Parallel logic programming in PARLOG - the language and its implementation , 1987 .

[26]  Mario Tokoro,et al.  Object-oriented concurrent programming , 1987 .

[27]  James H. Patterson,et al.  Portable Programs for Parallel Processors , 1987 .

[28]  Ken Kennedy,et al.  Automatic translation of FORTRAN programs to vector form , 1987, TOPL.

[29]  Pierre America,et al.  Pool-T: a parallel object-oriented language , 1987 .

[30]  D. K. Gifford,et al.  FX-87 PERFORMANCE MEASUREMENTS: DATAFLOW IMPLEMENTATION , 1988 .

[31]  Joel H. Saltz,et al.  Principles of runtime support for parallel processors , 1988, ICS '88.

[32]  Vivek Sarkar,et al.  A simple and efficient implmentation approach for single assignment languages , 1988, LFP '88.

[33]  F. Baskett,et al.  The 4D-MP graphics superworkstation: computing+graphics=40 MIPS+MFLOPS and 100000 lighted polygons per second , 1988, Digest of Papers. COMPCON Spring 88 Thirty-Third IEEE Computer Society International Conference.

[34]  David E. Culler,et al.  Resource requirements of dataflow programs , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[35]  Andrew P. Black,et al.  Fine-grained mobility in the Emerald system , 1987, TOCS.

[36]  K. R. Traub,et al.  Sequential implementation of lenient programming languages , 1988 .

[37]  P. Pierce,et al.  The NX/2 operating system , 1988, C3P.

[38]  Pete Tinker,et al.  Parallel execution of sequential scheme with ParaTran , 1988, LISP and Functional Programming.

[39]  K. Gopinath Copy elimination in single assignment languages , 1988 .

[40]  Brian N. Bershad,et al.  PRESTO: A system for object‐oriented parallel programming , 1988, Softw. Pract. Exp..

[41]  David Klappholz,et al.  Refined Fortran: an update , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[42]  Vivek Sarkar,et al.  Partitioning and Scheduling Parallel Programs for Multiprocessing , 1989 .

[43]  Vijay A. Saraswat,et al.  Concurrent constraint programming , 1989, POPL '90.

[44]  Joel H. Saltz,et al.  Run-time parallelization and scheduling of loops , 1989, SPAA '89.

[45]  Margaret Martonosi,et al.  Tradeoffs in Message Passing and Shared Memory Implementations of a Standard Cell Router , 1989, ICPP.

[46]  Nicholas Carriero,et al.  Linda in context , 1989, CACM.

[47]  Keshav Pingali,et al.  I-structures: data structures for parallel computing , 1986, Graph Reduction.

[48]  Vineet Singh,et al.  Inheritance and synchronization with enabled-sets , 1989, OOPSLA '89.

[49]  Vaidy S. Sunderam,et al.  PVM: A Framework for Parallel Distributed Computing , 1990, Concurr. Pract. Exp..

[50]  Willy Zwaenepoel,et al.  Adaptive software cache management for distributed shared memory architectures , 1990, ISCA '90.

[51]  Robin Milner,et al.  Definition of standard ML , 1990 .

[52]  David C. Cann,et al.  A Report on the Sisal Language Project , 1990, J. Parallel Distributed Comput..

[53]  Ken Kennedy,et al.  Fortran D Language Specification , 1990 .

[54]  Willy Zwaenepoel,et al.  Munin: distributed shared memory based on type-specific memory coherence , 1990, PPOPP '90.

[55]  Gul A. Agha,et al.  ACTORS - a model of concurrent computation in distributed systems , 1985, MIT Press series in artificial intelligence.

[56]  Harry Berryman,et al.  Run-Time Scheduling and Execution of Loops on Message Passing Machines , 1990, J. Parallel Distributed Comput..

[57]  Ian Foster,et al.  Strand: New Concepts in Parallel Programming , 1990 .

[58]  Michael Metcalf,et al.  Fortran 90 Explained , 1990 .

[59]  Robert H. Halstead,et al.  Lazy task creation: a technique for increasing the granularity of parallel programs , 1990, LISP and Functional Programming.

[60]  David Klappholz,et al.  Refined C: an update , 1990 .

[61]  Reinaldo J. Michelena,et al.  Tomographic string inversion , 1990 .

[62]  Paul Hudak,et al.  Single-threaded polymorphic lambda calculus , 1990, [1990] Proceedings. Fifth Annual IEEE Symposium on Logic in Computer Science.

[63]  Willy Zwaenepoel,et al.  Implementation and performance of Munin , 1991, SOSP '91.

[64]  John K. Reid The Fortran 90 Standard , 1991, Programming Environments for High-Level Scientific Problem Solving.

[65]  Ken Kennedy,et al.  Interprocedural transformations for parallel code generation , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[66]  David E. Culler,et al.  Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine , 1991, ASPLOS IV.

[67]  Monica S. Lam,et al.  Coarse-grain parallel programming in Jade , 1991, PPOPP '91.

[68]  Joel H. Saltz,et al.  Run-Time Parallelization and Scheduling of Loops , 1991, IEEE Trans. Computers.

[69]  Brian N. Bershad,et al.  Midway : shared memory parallel programming with entry consistency for distributed memory multiprocessors , 1991 .

[70]  Arvind,et al.  M-Structures: Extending a Parallel, Non-strict, Functional Language with State , 1991, FPCA.

[71]  David E. Culler,et al.  Compiler-Controlled Multithreading for Lenient Parallel Languages , 1991, FPCA.

[72]  Andrew W. Appel,et al.  Virtual memory primitives for user programs , 1991, ASPLOS IV.

[73]  Michael D. Smith,et al.  Tracing with Pixie , 1991 .

[74]  Prakash Panangaden,et al.  The semantic foundations of concurrent constraint programming , 1991, POPL '91.

[75]  Monica S. Lam,et al.  Efficient and exact data dependence analysis , 1991, PLDI '91.

[76]  Harry Berryman,et al.  Multiprocessors and run-time compilation , 1991, Concurr. Pract. Exp..

[77]  Greg Nelson,et al.  Systems programming in modula-3 , 1991 .

[78]  Steven Tuecke,et al.  Parallel programming with PCN , 1991 .

[79]  Henri E. Bal,et al.  Orca: A Language For Parallel Programming of Distributed Systems , 1992, IEEE Trans. Software Eng..

[80]  James R. Larus,et al.  Cooperative shared memory: software and hardware for scalable multiprocessor , 1992, ASPLOS V.

[81]  Michael E. Wolf,et al.  Improving locality and parallelism in nested loops , 1992 .

[82]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[83]  K. Mani Chandy,et al.  Compositional C++: Compositional Parallel Programming , 1992, LCPC.

[84]  Monica S. Lam,et al.  Heterogeneous parallel programming in Jade , 1992, Proceedings Supercomputing '92.

[85]  Monica S. Lam,et al.  Data Dependence and Data-Flow Analysis of Arrays , 1992, LCPC.

[86]  Rudolf Eigenmann,et al.  Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs , 1992, IEEE Trans. Parallel Distributed Syst..

[87]  Eric Cooper,et al.  Improving the performance of SML garbage collection using application-specific virtual memory management , 1992, LFP '92.

[88]  Daniel E. Lenoski,et al.  The design and analysis of DASH: a scalable directory-based multiprocessor , 1992 .

[89]  John L. Hennessy,et al.  Finding and Exploiting Parallelism in an Ocean Simulation Program: Experience, Results, and Implications , 1992, J. Parallel Distributed Comput..

[90]  Patricia Florissi,et al.  On remote procedure call , 1992, CASCON.

[91]  Ian T. Foster,et al.  Productive Parallel Programming: The PCN Approach , 1995, Sci. Program..

[92]  Simon L. Peyton Jones,et al.  Report on the programming language Haskell: a non-strict, purely functional language version 1.2 , 1992, SIGP.

[93]  John H. Reppy,et al.  Higher-Order Concurrency , 1992 .

[94]  J. Palmer,et al.  Connection Machine model CM-5 system overview , 1992, [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.

[95]  John K. Salmon,et al.  Parallel hierarchical N-body methods , 1992 .

[96]  Erik Hagersten,et al.  DDM - A Cache-Only Memory Architecture , 1992, Computer.

[97]  Monica S. Lam,et al.  Semantic foundations of Jade , 1992, POPL '92.

[98]  Anoop Gupta,et al.  The DASH prototype: implementation and performance , 1992, ISCA '92.

[99]  Alexandru Nicolau,et al.  Abstractions for recursive pointer data structures: improving the analysis and transformation of imperative programs , 1992, PLDI '92.

[100]  Rudolf Berrendorf,et al.  Evaluating the basic performance of the Intel iPSC/860 parallel computer , 1992, Concurr. Pract. Exp..

[101]  Kai Li,et al.  Heterogeneous Distributed Shared Memory , 1992, IEEE Trans. Parallel Distributed Syst..

[102]  Marc Levoy,et al.  Volume rendering on scalable shared-memory MIMD architectures , 1992, VVS.

[103]  John C. Mitchell,et al.  On the type structure of standard ML , 1993, TOPL.

[104]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[105]  Jaswinder Pal Singh,et al.  Hierarchical n-body methods and their implications for multiprocessors , 1993 .

[106]  Monica S. Lam,et al.  Global optimizations for parallelism and locality on scalable parallel machines , 1993, PLDI '93.

[107]  Andrea C. Arpaci-Dusseau,et al.  Parallel programming in Split-C , 1993, Supercomputing '93. Proceedings.

[108]  Seth Copen Goldstein,et al.  TAM - A Compiler Controlled Threaded Abstract Machine , 1993, J. Parallel Distributed Comput..

[109]  Edward Eric Rothberg,et al.  Exploiting the memory hierarchy in sequential and parallel sparse Cholesky factorization , 1992 .

[110]  John Zahorjan,et al.  Improving the performance of runtime parallelization , 1993, PPOPP '93.

[111]  James R. Larus,et al.  Cooperative shared memory: software and hardware for scalable multiprocessors , 1993, TOCS.

[112]  Harjinder S. Sandhu,et al.  The shared regions approach to software cache coherence on multiprocessors , 1993, PPOPP '93.

[113]  Monica S. Lam,et al.  Jade: a high-level, machine-independent language for parallel programming , 1993, Computer.

[114]  Anoop Gupta,et al.  Data locality and load balancing in COOL , 1993, PPOPP '93.

[115]  Jacques Cohen,et al.  Concurrent object-oriented programming , 1993, CACM.

[116]  Narain H. Gehani,et al.  Capsules: A Shared Memory Access Mechanism for Concurrent C/C++ , 1993, IEEE Trans. Parallel Distributed Syst..

[117]  Rudolf Eigenmann,et al.  Automatic program parallelization , 1993, Proc. IEEE.

[118]  Brian N. Bershad,et al.  The Midway distributed shared memory system , 1993, Digest of Papers. Compcon Spring.

[119]  Jon Beecroft,et al.  Meiko CS-2 Interconnect Elan-Elite Design , 1994, Parallel Comput..

[120]  Monica S. Lam,et al.  An Efficient Shared Memory Layer for Distributed Memory Machines. , 1994 .

[121]  Kai Li,et al.  Retrospective: virtual memory mapped network interface for the SHRIMP multicomputer , 1994, ISCA '98.

[122]  Henry M. Levy,et al.  A comparison of message passing and shared memory architectures for data parallel programs , 1994, ISCA '94.

[123]  Alan L. Cox,et al.  TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems , 1994, USENIX Winter.

[124]  R. Pease,et al.  Empirical forms for the electron/atom elastic scattering cross sections from 0 , 1994 .

[125]  K. Gharachodoo,et al.  Memory consistency models for shared memory multiprocessors , 1996 .