A distributed computing approach to improve the performance of the Parallel Ocean Program (v2.1)

Abstract. The Parallel Ocean Program (POP) is used in many strongly eddying ocean circulation simulations. Ideally it would be desirable to be able to do thousand-year-long simulations, but the current performance of POP prohibits these types of simulations. In this work, using a new distributed computing approach, two methods to improve the performance of POP are presented. The first is a block-partitioning scheme for the optimization of the load balancing of POP such that it can be run efficiently in a multi-platform setting. The second is the implementation of part of the POP model code on graphics processing units (GPUs). We show that the combination of both innovations also leads to a substantial performance increase when running POP simultaneously over multiple computational platforms.

[1]  Joseph E. Flaherty,et al.  Resource-aware scientific computation on a heterogeneous cluster , 2005, Computing in Science & Engineering.

[2]  M. Maltrud,et al.  Numerical simulation of the North Atlantic Ocean at 1/10 degrees , 2000 .

[3]  Cyril Fonlupt,et al.  Data-Parallel Load Balancing Strategies , 1998, Parallel Comput..

[4]  Manish Vachharajani,et al.  GPU acceleration of numerical weather prediction , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[5]  Wen-mei W. Hwu,et al.  Program optimization space pruning for a multithreaded gpu , 2008, CGO '08.

[6]  Patrick H. Worley The Performance Evolution of the Parallel Ocean Program on the Cray X1 (paper) , 2004 .

[7]  Mariana Vertenstein,et al.  The Parallel Ocean Program (POP) reference manual: Ocean component of the Community Climate System Model (CCSM) , 2010 .

[8]  John M. Dennis Inverse Space-Filling Curve Partitioning of a Global Ocean Model , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[9]  Frank O. Bryan,et al.  Boundary impulse response functions in a century-long eddying global ocean simulation , 2010 .

[10]  Henk A. Dijkstra,et al.  Response of the Atlantic Ocean circulation to Greenland Ice Sheet melting in a strongly‐eddying ocean model , 2012 .

[11]  Darren J. Kerbyson,et al.  A Performance Model of the Parallel Ocean Program , 2005, Int. J. High Perform. Comput. Appl..

[12]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[13]  Rainer Feistel,et al.  Accurate and Computationally Efficient Algorithms for Potential Temperature and Density of Seawater , 2003 .

[14]  W. Large,et al.  Oceanic vertical mixing: a review and a model with a nonlocal boundary layer parameterization , 1994 .

[15]  Jason Maassen,et al.  Smartsockets: solving the connectivity problems in grid computing , 2007, HPDC '07.

[16]  Geoffrey K. Vallis,et al.  Atmospheric and Oceanic Fluid Dynamics: Fundamentals and Large-Scale Circulation , 2017 .

[17]  Rob H. Bisseling,et al.  Accelerating a barotropic ocean model using a GPU , 2012 .

[18]  J. Dukowicz,et al.  Implicit free‐surface method for the Bryan‐Cox‐Semtner ocean model , 1994 .