High-level buffering for hiding periodic output cost in scientific simulations

Scientific applications often need to write out large arrays and associated metadata periodically for visualization or restart purposes. In this paper, we present active buffering, a high-level transparent buffering scheme for collective I/O, in which processors actively organize their idle memory into a hierarchy of buffers for periodic output data. It utilizes idle memory on the processors, yet makes no assumption regarding runtime memory availability. Active buffering can perform background I/O while the computation is going on, is extensible to remote I/O for more efficient data migration, and can be implemented in a portable style in today's parallel I/O libraries. It can also mask performance problems of scientific data formats used by many scientists. Performance experiments with both synthetic benchmarks and real simulation codes on multiple platforms show that active buffering can greatly reduce the visible I/O cost from the application's point of view.

[1]  Ian T. Foster,et al.  Globus: a Metacomputing Infrastructure Toolkit , 1997, Int. J. High Perform. Comput. Appl..

[2]  Marianne Winslett,et al.  Tuning high-performance scientific codes: the use of performance models to control resource usage during data migration and I/O , 2001, ICS '01.

[3]  Ian Foster,et al.  Disk resident arrays: an array-oriented I/O library for out-of-core computations , 1996, Proceedings of 6th Symposium on the Frontiers of Massively Parallel Computation (Frontiers '96).

[4]  Marianne Winslett,et al.  RFS: efficient and flexible remote file access for MPI-IO , 2004, 2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935).

[5]  Rajeev Thakur,et al.  Passion: Optimized I/O for Parallel Applications , 1996, Computer.

[6]  Ian T. Foster,et al.  Distant I/O: one-sided access to secondary storage on remote processors , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[7]  Michael J. Quinn,et al.  Enhancing Disk-Directed I/O for Fine-Grained Redistribution of File Data , 1997, Parallel Comput..

[8]  Leonid Oliker,et al.  A Comparison of Three Programming Models for Adaptive Applications on the Origin2000 , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[9]  Andrew A. Chien,et al.  PPFS: a high performance portable parallel file system , 1995, ICS '95.

[10]  Sanjeev Setia,et al.  Availability and utility of idle memory in workstation clusters , 1999, SIGMETRICS '99.

[11]  Marianne Winslett,et al.  Active buffering plus compressed migration: an integrated solution to parallel simulations' data transport needs , 2002, ICS '02.

[12]  David Kotz,et al.  The galley parallel file system , 1997, ICS '96.

[13]  Yi Pan,et al.  More Efficient Topological Sort Using Reconfigurable Optical Buses , 2004, The Journal of Supercomputing.

[14]  Rajeev Thakur,et al.  Improving collective I/O performance using threads , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.

[15]  John May,et al.  Parallel I/O for High Performance Computing , 2000 .

[16]  Jianwei Li,et al.  Parallel netCDF: A High-Performance Scientific I/O Interface , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[17]  Marianne Winslett,et al.  Faster collective output through active buffering , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[18]  Joel H. Saltz,et al.  An interprocedural framework for placement of asynchronous I/O operations , 1996, ICS '96.

[19]  David Kotz,et al.  Disk-directed I/O for MIMD multiprocessors , 1994, OSDI '94.

[20]  Ron A. Oldfield,et al.  Efficient Parallel I/o in sEismic Imaging , 1998, Int. J. High Perform. Comput. Appl..

[21]  Marianne Winslett,et al.  Server-Directed Collective I/O in Panda , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[22]  Alok N. Choudhary,et al.  MTIO. A multi-threaded parallel I/O system , 1997, Proceedings 11th International Parallel Processing Symposium.

[23]  Jesús Carretero,et al.  Design and implementation of a parallel I/O runtime system for irregular applications , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[24]  Ian T. Foster,et al.  GASS: a data movement and access service for wide area computing systems , 1999, IOPADS '99.

[25]  Marianne Winslett,et al.  Improving MPI-IO output performance with active buffering plus threads , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[26]  Alok N. Choudhary,et al.  Design and evaluation of primitives for parallel I/O , 1993, Supercomputing '93. Proceedings.

[27]  Rajeev Thakur,et al.  On implementing MPI-IO portably and with high performance , 1999, IOPADS '99.