论文信息 - Using MPI file caching to improve parallel write performance for large-scale scientific applications

Using MPI file caching to improve parallel write performance for large-scale scientific applications

Typical large-scale scientific applications periodically write checkpoint files to save the computational state throughout execution. Existing parallel file systems improve such write-only I/O patterns through the use of client-side file caching and write-behind strategies. In distributed environments where files are rarely accessed by more than one client concurrently, file caching has achieved significant success; however, in parallel applications where multiple clients manipulate a shared file, cache coherence control can serialize I/O. We have designed a thread based caching layer for the MPI I/O library, which adds a portable caching system closer to user applications so more information about the application's I/O patterns is available for better coherence control. We demonstrate the impact of our caching solution on parallel write performance with a comprehensive evaluation that includes a set of widely used I/O benchmarks and production application I/O kernels.

[1] Wei-keng Liao,et al. An Implementation and Evaluation of Client-Side File Caching for MPI-IO , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[2] Anna R. Karlin,et al. Implementing cooperative prefetching and caching in a globally-managed memory system , 1998, SIGMETRICS '98/PERFORMANCE '98.

[3] Rajeev Thakur,et al. An Extended Two-Phase Method for Accessing Sections of Out-of-Core Arrays , 1996, Sci. Program..

[4] Rajeev Thakur,et al. On implementing MPI-IO portably and with high performance , 1999, IOPADS '99.

[5] Andrew A. Chien,et al. PPFS: a high performance portable parallel file system , 1995, ICS '95.

[6] Bin Jia,et al. MPI-IO/GPFS, an Optimized Implementation of MPI-IO on Top of GPFS , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[7] Robert Latham,et al. Implementing MPI-IO atomic mode without file system support , 2005, CCGrid 2005. IEEE International Symposium on Cluster Computing and the Grid, 2005..

[8] Rob VanderWijngaart,et al. NAS Parallel Benchmarks I/O Version 2.4. 2.4 , 2002 .

[9] Rajeev Thakur,et al. Users guide for ROMIO: A high-performance, portable MPI-IO implementation , 1997 .

[10] Frank B. Schmuck,et al. GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[11] Bill Nitzberg,et al. PMPIO-a portable implementation of MPI-IO , 1996, Proceedings of 6th Symposium on the Frontiers of Massively Parallel Computation (Frontiers '96).

[12] Marianne Winslett,et al. Improving MPI-IO output performance with active buffering plus threads , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[13] B. Fryxell,et al. FLASH: An Adaptive Mesh Hydrodynamics Code for Modeling Astrophysical Thermonuclear Flashes , 2000 .

[14] Alok N. Choudhary,et al. Improved parallel I/O via a two-phase run-time access strategy , 1993, CARN.

[15] Wei-keng Liao,et al. Collective caching: application-aware client-side file caching , 2005, HPDC-14. Proceedings. 14th IEEE International Symposium on High Performance Distributed Computing, 2005..

[16] Florin Isaila,et al. Integrating collective I/O and cooperative caching into the "clusterfile" parallel file system , 2004, ICS '04.

[17] Ieee Standards Board. System application program interface (API) (C language) , 1990 .

[18] E. Lusk,et al. An abstract-device interface for implementing portable parallel-I/O interfaces , 1996, Proceedings of 6th Symposium on the Frontiers of Massively Parallel Computation (Frontiers '96).

[19] C. Law,et al. Direct Numerical Simulations of Turbulent Lean Premixed Combustion. , 2006 .

[20] Jesús Labarta,et al. PACA: A Cooperative File System Cache for Parallel Machines , 1996, Euro-Par, Vol. I.

[21] Andrew S. Tanenbaum,et al. Distributed systems: Principles and Paradigms , 2001 .

[22] Alice E. Koniges,et al. Towards a High-Performance Implementation of MPI-IO on Top of GPFS , 2000, Euro-Par.

[23] Michael Dahlin,et al. Cooperative caching: using remote client memory to improve file system performance , 1994, OSDI '94.

[24] Lustre : A Scalable , High-Performance File System Cluster , 2003 .