Parallel Architecture and its Performance of Oceanic Global Circulation Model Based on MOM3 to be Run on the Earth Simulator

Abstract. In this study, we will present latest results from evaluation of our computational optimized code OFES based on MOM3 to run on the Earth Simulator. O ( 10 ) years integration with 0.1 degree for horizontal will be one of the first attempts to solve the largest scale scientific simulations. In order to keep the flexibility of MOM3 from points ofscientific view, we consider two types of parallel architectures due to the difference from resolution to represent physical performance in oceanic phenomena. One is, for the relative lower resolved phenomena with longer integration time, characterized by using shared memory system for improvement parallel performance within a single node composed of 8PEs. To achieve the most efficiency parallel computation inside of a node, we modified MPI library into assembly coded library. Another parallel computational improvement, for case of ultra high resolution of 0.1 degree for horizontal, employed by only communication with MPI library, which is not distinct from inside or outside of node. In this case, we took into account a mount of computation in halo region to attain to huge parallelized performance. As the results, the computational efficiency has been achieved high computational speed with more about 500 times performance comparing CPU time on a single node. The load imbalance was not recognized. In this paper, we will indicate optimization strategy for both two cases to attain target performance and results from measurement on the Earth Simulator. Experiments for ultra high resolution case carried out by using 188 nodes, which is composed of 1500 PEs.