Finding and Exploiting Parallelism in an Ocean Simulation Program: Experience, Results, and Implications

Abstract How to code and compile large application programs for execution on parallel processors is perhaps the biggest challenge facing the widespread adoption of multiprocessing. To gain insight into this problem, an ocean simulation application was converted to a parallel version. The parallel program demonstrated near-linear speed-up on an Encore Multimax, a 16-processor bus-based shared-memory machine. Parallelizing an existing sequential application—not just a single loop or computational kernel—leads to interesting insights about what issues are significant in the process of finding and implementing parallelism, and what the major challenges are. Three levels of approach to the problem of finding parallelism—loop-level parallelization, program restructuring, and algorithm modification—were attempted, with widely varying results. Loop-level parallelization did not scale sufficiently. High-level restructuring was useful for much of the application, but obtaining an efficient parallel program required algorithm changes to one portion of it. Implementation issues for scalable performance, such as data locality and synchronization, are also discussed. The nature, requirements, and success of the various transformations lend insight into the design of parallelizing tools and parallel programming environments.