DPF: a data parallel Fortran benchmark suite

We present the Data Parallel Fortran (DPF) benchmark suite, a set of data parallel Fortran codes for evaluating data parallel compilers appropriate for any target parallel architecture, with shared or distributed memory. The codes are provided in basic, optimized and several library versions. The functionality of the benchmarks cover collective communication functions, scientific software library functions, and application kernels that reflect the computational structure and communication patterns in fluid dynamic simulations, fundamental physics and molecular studies in chemistry or biology. The DPF benchmark suite assumes the language model of High Performance Fortran, and provides performance evaluation metrics of busy and elapsed times and FLOP rates, FLOP count, memory usage, communication patterns, focal memory access, and arithmetic efficiency as well as operation and communication counts per iteration. An instance of the benchmark suite was fully implemented in CM-Fortran and tested on the CM-5.

[1]  S. Lennart Johnsson,et al.  Matrix multiplication on the connection machine , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[2]  H. H. Rachford,et al.  On the numerical solution of heat conduction problems in two and three space variables , 1956 .

[3]  P. Olsson Summation by parts, projections, and stability. II , 1995 .

[4]  D. M. Beazley,et al.  50 GFlops molecular dynamics on the Connection Machine 5 , 1993, Supercomputing '93.

[5]  Alan Edelman,et al.  Index Transformation Algorithms in a Linear Algebra Framework , 1994, IEEE Trans. Parallel Distributed Syst..

[6]  David H. Bailey,et al.  The NAS kernel benchmark program , 1985 .

[7]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[8]  F. B. Ellerby,et al.  Numerical solutions of partial differential equations by the finite element method , by C. Johnson. Pp 278. £40 (hardback), £15 (paperback). 1988. ISBN 0-521-34514-6, 34758-0 (Cambridge University Press) , 1989, The Mathematical Gazette.

[9]  Monica S. Lam,et al.  Jade: a high-level, machine-independent language for parallel programming , 1993, Computer.

[10]  S. Lennart Johnsson,et al.  Optimizing Tridiagonal Solvers for Alternating Direction Methods on Boolean Cube Multiprocessors , 1989, SIAM J. Sci. Comput..

[11]  James B. Anderson,et al.  Quantum chemistry by random walk: Exact treatment of many-electron systems , 1991 .

[12]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[13]  Corporate Rice University,et al.  High performance Fortran language specification , 1993, FORF.

[14]  S. Johnsson Solving tridiagonal systems on ensemble architectures , 1987 .

[15]  Thomas J. R. Hughes,et al.  Mesh Decomposition and Communication Procedures for Finite Element Applications on the Connection Machine CM-5 System , 1994, HPCN.

[16]  B. Strand Summation by parts for finite difference approximations for d/dx , 1994 .

[17]  S. Lennart Johnsson,et al.  Block-Cyclic Dense Linear Algebra , 1993, SIAM J. Sci. Comput..

[18]  Claes Johnson,et al.  On the convergence of a finite element method for a nonlinear hyperbolic conservation law , 1987 .

[19]  George Cybenko,et al.  Supercomputer performance evaluation and the Perfect Benchmarks , 1990, ICS '90.

[20]  Jill P. Mesirov,et al.  Programming Direct N-Body Solvers on Connection Machines , 1992 .

[21]  James Hardy Wilkinson,et al.  Error Analysis of Direct Methods of Matrix Inversion , 1961, JACM.

[22]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .

[23]  Geoffrey C. Fox,et al.  The Perfect Club Benchmarks: Effective Performance Evaluation of Supercomputers , 1989, Int. J. High Perform. Comput. Appl..

[24]  S. Lennart Johnsson,et al.  A Vector Space Framework for Parallel Stable Permutations , 1995 .

[25]  Roger W. Hockney,et al.  A Fast Direct Solution of Poisson's Equation Using Fourier Analysis , 1965, JACM.

[26]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[27]  G. Habetler,et al.  An Alternating-Direction-Implicit Iteration Technique , 1960 .

[28]  V. Klema LINPACK user's guide , 1980 .

[29]  James B. Anderson,et al.  A quantum Monte Carlo calculation of the ground state energy of the hydrogen molecule , 1991 .

[30]  Michael Metcalf,et al.  Fortran 90 Explained , 1990 .

[31]  R. Schreiber,et al.  On the convergence of the cyclic Jacobi method for parallel block orderings , 1989 .

[32]  Pelle Olsson,et al.  The numerical behavior of high-order finite difference methods , 1994 .

[33]  Lawrence Rauchwerger,et al.  Perfect Benchmarks: Instrumented Version , 1991 .

[34]  Michael W. Berry,et al.  Public international benchmarks for parallel computers: PARKBENCH committee: Report-1 , 1994 .

[35]  H. Nussbaumer Fast Fourier transform and convolution algorithms , 1981 .

[36]  S. Lennart Johnsson,et al.  Solving narrow banded systems on ensemble architectures , 1985, TOMS.

[37]  H. H. Rachford,et al.  The Numerical Solution of Parabolic and Elliptic Differential Equations , 1955 .

[38]  Monica S. Lam,et al.  The design and evaluation of a shared object system for distributed memory machines , 1994, OSDI '94.

[39]  M. Almond,et al.  Computers in Physics , 1971 .

[40]  R W Hockney,et al.  Computer Simulation Using Particles , 1966 .

[41]  T. J. Dekker,et al.  Rehabilitation of the Gauss-Jordan algorithm , 1989 .

[42]  Thomas J. R. Hughes,et al.  An efficient communications strategy for finite element methods on the Connection Machine CM-5 system , 1994 .

[43]  Br Stonebridge Review of Fortran 90 Explained by Michael Metcalf and J. K. Reid , 1991 .

[44]  S. Lennart Johnsson,et al.  A Stencil Complier for the Connection Machine Models CM-2/200 , 1993 .

[45]  S. Lennart Johnsson,et al.  Optimum Broadcasting and Personalized Communication in Hypercubes , 1989, IEEE Trans. Computers.

[46]  Yu Hu,et al.  Implementing O(N) N-Body Algorithms Efficiently in Data-Parallel Languages , 1996, Sci. Program..

[47]  Anoop Gupta,et al.  The DASH Prototype: Logic Overhead and Performance , 1993, IEEE Trans. Parallel Distributed Syst..