论文信息 - I/O performance of the Santos Dumont supercomputer

I/O performance of the Santos Dumont supercomputer

In this article, we study the I/O performance of the Santos Dumont supercomputer, since the gap between processing and data access speeds causes many applications to spend a large portion of their execution on I/O operations. For a large-scale expensive supercomputer, it is essential to ensure applications achieve the best I/O performance to promote efficient usage. We monitor a week of the machine’s activity and present a detailed study on the obtained metrics, aiming at providing an understanding of its workload. From experiences with one numerical simulation, we identified large I/O performance differences between the MPI implementations available to users. We investigated the phenomenon and narrowed it down to collective I/O operations with small request sizes. For these, we concluded that the customized MPI implementation by the machine’s vendor (used by more than 20% of the jobs) presents the worst performance. By investigating the issue, we provide information to help improve future MPI-IO collective write implementations and practical guidelines to help users and steer future system upgrades. Finally, we discuss the challenge of describing applications I/O behavior without depending on information from users. That allows for identifying the application’s I/O bottlenecks and proposing ways of improving its I/O performance. We propose a methodology to do so, and use GROMACS, the application with the largest number of jobs in 2017, as a case study.

[1] Olivier Simonin,et al. High Performance Computing (HPC) for the Fluidization of Particle-Laden Reactive Flows , 2013 .

[2] Bruno Raffin,et al. A Flexible Framework for Asynchronous in Situ and in Transit Analytics for Scientific Simulations , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[3] Carla Schlatter Ellis,et al. File-Access Characteristics of Parallel Scientific Workloads , 1996, IEEE Trans. Parallel Distributed Syst..

[4] Robert B. Ross,et al. Small-file access in parallel file systems , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[5] Robert Latham,et al. Understanding and improving computational science storage access through continuous characterization , 2011, 2011 IEEE 27th Symposium on Mass Storage Systems and Technologies (MSST).

[6] Jean Luca Bez,et al. A Checkpoint of Research on Parallel I/O for High-Performance Computing , 2018, ACM Comput. Surv..

[7] Scott Klasky,et al. Characterizing output bottlenecks in a supercomputer , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[8] Ron Brightwell. A Comparison of Three MPI Implementations for Red Storm , 2005, PVM/MPI.

[9] 村井均,et al. NAS Parallel Benchmarks によるHPFの評価 , 2006 .

[10] Ron Brightwell. A New MPI Implementation for Cray SHMEM , 2004, PVM/MPI.

[11] Jean Luca Bez,et al. Collective I/O Performance on the Santos Dumont Supercomputer , 2018, 2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP).

[12] Nick Feltovich. Nonparametric Tests of Differences in Medians: Comparison of the Wilcoxon–Mann–Whitney and Robust Rank-Order Tests , 2003 .

[13] Robert B. Ross,et al. CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[14] Robert Gentleman,et al. Software for Computing and Annotating Genomic Ranges , 2013, PLoS Comput. Biol..

[15] Galen M. Shipman,et al. Workload characterization of a leadership class storage cluster , 2010, 2010 5th Petascale Data Storage Workshop (PDSW '10).

[16] Marianne Winslett,et al. A Multiplatform Study of I/O Behavior on Petascale Supercomputers , 2015, HPDC.

[17] O. J. Dunn. Multiple Comparisons among Means , 1961 .

[18] Peter M. Kasson,et al. GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit , 2013, Bioinform..

[19] LarssonPer,et al. GROMACS 4.5 , 2013 .

[20] Feng Wang,et al. File System Workload Analysis For Large Scale Scientific Com puting Applications , 2004 .

[21] Brian Vinter,et al. A Comparison of Three MPI Implementations , 2004 .

[22] Francieli Zanon Boito,et al. Improving Atmospheric Model Performance on a Multi-Core Cluster System , 2012 .

[23] Yifeng Zhu,et al. A study of self-similarity in parallel I/O workloads , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[24] Berk Hess,et al. GROMACS—the road ahead , 2011 .

[25] Frank B. Schmuck,et al. GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.