A quantitative study of parallel scientific applications with explicit communication

This paper studies the behavior of scientific applications running on distributed memory parallel computers. Our goal is to quantify the floating point, memory, I/O, and communication requirements of highly parallel scientific applications that perform explicit communication. In addition to quantifying these requirements for fixed problem sizes and numbers of processors, we develop analytical models for the effects of changing the problem size and the degree of parallelism for several of the applications.The contribution of our paper is that it provides quantitative data about real parallel scientific applications in a manner that is largely independent of the specific machine on which the application was run. Such data, which are clearly very valuable to an architect who is designing a new parallel computer, were not previously available. For example, the majority of research papers in interconnection networks have used simulated communication loads consisting of fixed-size messages. Our data, which show that using such simulated loads is unrealistic, can be used to generate more realistic communication loads.

[1]  Prithviraj Banerjee,et al.  Performance measurement and trace driven simulation of parallel CAD and numeric applications on a hypercube multicomputer , 1990, ISCA '90.

[2]  Prithviraj Banerjee,et al.  A study of I/O behavior of Perfect benchmarks on a multiprocessor , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[3]  Jonathan William Flower Lattice Gauge Theory on a Parallel Computer , 1987 .

[4]  S. Cuccaro,et al.  Quantum mechanical reactive scattering using a high-performance distributed-memory parallel computer , 1990 .

[5]  Steven J. Plimpton,et al.  Scalable parallel molecular dynamics on MIMD supercomputers , 1992, Proceedings Scalable High Performance Computing Conference SHPCC-92..

[6]  Aron Kuppermann,et al.  Prediction of the effect of the geometric phase on product rotational state distributions and integral cross sections , 1992 .

[7]  Roy D. Williams,et al.  Performance of dynamic load balancing algorithms for unstructured mesh calculations , 1991, Concurr. Pract. Exp..

[8]  A. Leonard Vortex methods for flow simulation , 1980 .

[9]  Steve William Otto Monte Carlo Methods in Lattice Gauge Theories. , 1983 .

[10]  Alok N. Choudhary,et al.  High-performance I/O for massively parallel computers: problems and prospects , 1994, Computer.

[11]  Robert W. Dutton,et al.  A STRIDE towards practical 3-D device simulation-numerical and visualization considerations , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[12]  A. Kuppermann,et al.  Theoretical calculation of experimentally observable consequences of the geometric phase on chemical reaction cross sections , 1991 .

[13]  David Walker,et al.  Benchmarking Advanced Architecture Computers , 1990, Concurr. Pract. Exp..

[14]  Anoop Gupta,et al.  Memory-reference characteristics of multiprocessor applications under MACH , 1988, SIGMETRICS '88.

[15]  Anoop Gupta,et al.  Working Sets, Cache Sizes, And Node Granularity Issues For Large-scale Multiprocessors , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[16]  Anoop Gupta,et al.  Memory-reference characteristics of multiprocessor applications under MACH , 1988, SIGMETRICS 1988.

[17]  Frederica Darema,et al.  Memory access patterns of parallel scientific programs , 1987, SIGMETRICS '87.

[18]  David Kotz,et al.  Dynamic file-access characteristics of a production parallel scientific workload , 1994, Proceedings of Supercomputing '94.