论文信息 - Finding and Exploiting Parallelism in an Ocean Simulation Program: Experience, Results, and Implications

Finding and Exploiting Parallelism in an Ocean Simulation Program: Experience, Results, and Implications

Abstract How to code and compile large application programs for execution on parallel processors is perhaps the biggest challenge facing the widespread adoption of multiprocessing. To gain insight into this problem, an ocean simulation application was converted to a parallel version. The parallel program demonstrated near-linear speed-up on an Encore Multimax, a 16-processor bus-based shared-memory machine. Parallelizing an existing sequential application—not just a single loop or computational kernel—leads to interesting insights about what issues are significant in the process of finding and implementing parallelism, and what the major challenges are. Three levels of approach to the problem of finding parallelism—loop-level parallelization, program restructuring, and algorithm modification—were attempted, with widely varying results. Loop-level parallelization did not scale sufficiently. High-level restructuring was useful for much of the application, but obtaining an efficient parallel program required algorithm changes to one portion of it. Implementation issues for scalable performance, such as data locality and synchronization, are also discussed. The nature, requirements, and success of the various transformations lend insight into the design of parallelizing tools and parallel programming environments.

John L. Hennessy | Jaswinder Pal Singh | J. Singh | J. Hennessy

[1] R. Sweet. A Cyclic Reduction Algorithm for Solving Block Tridiagonal Systems of Arbitrary Dimension , 1977 .

[2] William R. Holland,et al. The Role of Mesoscale Eddies in the General Circulation of the Ocean—Numerical Experiments Using a Wind-Driven Quasi-Geostrophic Model , 1978 .

[3] Worley. Information requirements and the implications for parallel computation. Doctoral thesis , 1988 .

[4] Paul Feautrier,et al. A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.

[5] E. L. Lusk,et al. Use of monitors in FORTRAN: a tutorial on the barrier, self-scheduling DO-loop, and askfor monitors , 1985 .

[6] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[7] Gene H. Golub,et al. Matrix computations , 1983 .

[8] Anoop Gupta,et al. COOL: a language for parallel programming , 1990 .