Experience with a clustered parallel reduction machine

A clustered architecture has been designed to exploit divide and conquer parallelism in functional programs. The programming methodology developed for the machine is based on explicit annotations and program transformations. It has been successfully applied to a number of algorithms resulting in a benchmark of small and medium size parallel functional programs. Sophisticated compilation techniques are used such as strictness analysis on non-flat domains and RISC and VLIW code generation. Parallel jobs are distributed by an efficient hierarchical scheduler. A special processor for graph reduction has been designed as a basic block for the machine. A prototype of a single cluster machine has been constructed with stock hardware. This paper describes the experience with the project and its current state.

[1]  Willem G. Vree,et al.  Arrays in a lazy functional language -- a case study: the fast Fourier transform , 1992 .

[2]  Lal George,et al.  An abstract machine for parallel graph reduction , 1989, FPCA.

[3]  Marko C. J. D. van Eekelen,et al.  CLEAN: A language for functional graph writing , 1987, FPCA.

[4]  Martin S. Feather,et al.  A System for Assisting Program Transformation , 1982, TOPL.

[5]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[6]  Simon L. Peyton Jones,et al.  GRIP - A high-performance architecture for parallel graph reduction , 1987, FPCA.

[7]  David R. Lester,et al.  The HDG-Machine: A Highly Distributed Graph-Reducer for a Transputer Network , 1991, Comput. J..

[8]  Pieter H. Hartel,et al.  Compilation of functional languages using flow graph analysis , 1994, Softw. Pract. Exp..

[9]  Willem G. Vree,et al.  Communication lifting: fixed point computation for parallelism , 1995, Journal of Functional Programming.

[10]  Alexandru Nicolau,et al.  Parallel processing: a smart compiler and a dumb machine , 1984, SIGP.

[11]  Pieter H. Hartel,et al.  FCG: A Code Generator for Lazy Functional Languages , 1992, CC.

[12]  Willem G. Vree,et al.  The G-Line: A Distributed Processor for Graph Reduction , 1991, PARLE.

[13]  Rita Loogen,et al.  Distributed Implementation of Programmed Graph Reduction , 1989, PARLE.

[14]  Richard P. Hopkins,et al.  Data-Driven and Demand-Driven Computer Architecture , 1982, CSUR.

[15]  H. H. Wang,et al.  A Parallel Method for Tridiagonal Equations , 1981, TOMS.

[16]  Steve Johnson,et al.  Compiling C for vectorization, parallelization, and inline expansion , 1988, PLDI '88.

[17]  Harold T. Hodes,et al.  The | lambda-Calculus. , 1988 .

[18]  Paul Hudak,et al.  Serial Combinators: "Optimal" Grains of Parallelism , 1985, FPCA.

[19]  Henk Barendregt,et al.  The Lambda Calculus: Its Syntax and Semantics , 1985 .

[20]  Simon L. Peyton Jones,et al.  Report on the programming language Haskell: a non-strict, purely functional language version 1.2 , 1992, SIGP.

[21]  Eric Nöcker,et al.  Strictness analysis using abstract reduction , 1993, FPCA '93.

[22]  Willem G. Vree,et al.  Evaluation of distributed hierarchical scheduling with explicit grain size control , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..

[23]  Peter G. Harrison,et al.  Functional Programming , 1988 .

[24]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[25]  Anthony J. G. Hey,et al.  Experiments in mimd parallelism , 1989, Future Gener. Comput. Syst..

[26]  Geoffrey L. Burn,et al.  Evaluation transformers - a model for the parallel evaluation of functional languages (extended abstract) , 1987, FPCA.

[27]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[28]  Marko C. J. D. van Eekelen,et al.  Parallel graph rewriting - some contributions to its theory, its implementation and its application , 1988 .

[29]  Steven R. Vegdahl,et al.  A Survey of Proposed Architectures for the Execution of Functional Languages , 1984, IEEE Transactions on Computers.

[30]  Willem G. Vree,et al.  Implementation of Parallel Graph Reduction by Explicit Annotation and Program Transformation , 1990, MFCS.

[31]  Simon L. Peyton Jones,et al.  The Implementation of Functional Programming Languages , 1987 .

[32]  Pieter H. Hartel,et al.  Benchmarking implementations of lazy functional languages , 1993, FPCA '93.

[33]  Monica S. Lam,et al.  Architecture and Compiler Tradeoffs for a Long Instruction Word Microprocessor , 1989, ASPLOS.

[34]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[35]  David R. Lester,et al.  Stacklessness: compiling recursion for a distributed architecture , 1989, FPCA.

[36]  Thomas Johnsson Efficient compilation of lazy evaluation , 1984, SIGP.

[37]  Luc Maranget,et al.  GAML: a Parallel Implementation of Lazy ML , 1991, FPCA.

[38]  Marinus J. Plasmeijer,et al.  Generating Efficient Code for Lazy Functional Languages , 1991, FPCA.

[39]  Marko C. J. D. van Eekelen,et al.  Concurrent Clean , 1991, PARLE.

[40]  Simon H. Lavington,et al.  Parallel Associative Combinator Evaluation , 1991, PARLE.

[41]  Robert M. Keller,et al.  The Gradient Model Load Balancing Method , 1987, IEEE Transactions on Software Engineering.

[42]  A. C. Norman,et al.  SKIM - The S, K, I reduction machine , 1980, LISP Conference.

[43]  Phil Wadfer,et al.  Strictness analysis on non-fiat domains (by abstract interpretation over finite domains) , 1985 .

[44]  Monica Sin-Ling Lam,et al.  A Systolic Array Optimizing Compiler , 1989 .

[45]  Willem G. Vree,et al.  The Dutch parallel reduction machine project , 1987, Future Gener. Comput. Syst..

[46]  Marinus J. Plasmeijer,et al.  Concurrent Clean, Language Manual - Version 0.8 (revised version) , 1993 .

[47]  John Darlington,et al.  A Transformation System for Developing Recursive Programs , 1977, J. ACM.

[48]  Simon L. Peyton Jones,et al.  High-Performance parallel graph reduction , 1989, PARLE.

[49]  Thomas Johnsson,et al.  Parallel graph reduction with the (v , G)-machine , 1989, FPCA.

[50]  Donald F. Towsley,et al.  Analysis of the Effects of Delays on Load Sharing , 1989, IEEE Trans. Computers.

[51]  Simon L. Peyton Jones,et al.  Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine , 1992, Journal of Functional Programming.

[52]  Willem G. Vree,et al.  FRATS: A Parallel Reduction Strategy for Shared Memory , 1991, PLILP.

[53]  P. H. Hartel,et al.  Performance analysis of storage management in combinator graph reduction , 1989 .

[54]  Willem G. Vree,et al.  Memory Management for Parallel Tasks in Shared Memory , 1992, IWMM.

[55]  Richard Kennaway,et al.  Novel architectures for declarative languages , 1983, Softw. Microsystems.

[56]  Mark Scheevel NORMA: a graph reduction processor , 1986, LFP '86.

[57]  D. A. Turner,et al.  A new implementation technique for applicative languages , 1979, Softw. Pract. Exp..

[58]  Koen Langendoen,et al.  Graph reduction on shared-memory multiprocessors , 1993 .

[59]  Richard B. Kieburtz,et al.  The G-Machine: A Fast, Graph-Reduction Evaluator , 1985, FPCA.