Parallelization of NAPOM implementation

In this paper, the code for the North Atlantic Princeton Ocean Model (NAPOM) used by the Marine Biology Station (MBS) is parallelized and optimized. The FORTRAN source code and the hardware architecture of MBS cluster are examined and analyzed to determine the behavior of the NAPOM execution with bottlenecks identified on both ends. Based on the analysis, the most effective optimization and parallelization actions are planned. Most time consuming modules of the NAPOM package are optimized to achieve maximal performance on the hardware architecture. The pre-process modules are distributed on more computational nodes while all independent complex operations are parallelized with the shared memory principles. The resulting parallelized implementation of the NAPOM package executes nearly four times faster than the original one with only a minimal additional load to the MBS cluster.