Pattern-aware file reorganization in MPI-IO

Scientific computing is becoming more data-intensive; however I/O throughput is not growing at the same rate. MPI-IO and parallel file systems are expected to help bridge the gap by increasing data access parallelism. Compared to traditional I/O systems, some factors are more important in parallel I/O system in order to achieve better performance, such as the number of requests and contiguousness of accesses. The variation of these factors can lead to significant differences in performance. Programmers usually arrange data in a logical fashion for ease of programming and data manipulation; however, this may not be ideal for parallel I/O systems. Without taking into account the organization of file and behavior of the I/O system, the performance may be badly degraded. In this paper, a novel method of reorganizing files in I/O middleware level is proposed, which takes into account the access patterns. By placing data in a way favoring the parallel I/O system, gains of up to two orders of magnitudes in reading and up to one order of magnitude in writing were observed with spinning disks and solid-state disks.

[1]  Jun Wang,et al.  Bridging the Gap Between Parallel File Systems and Local File Systems: A Case Study with PVFS , 2008, 2008 37th International Conference on Parallel Processing.

[2]  Surendra Byna,et al.  Parallel I/O prefetching using MPI file caching and I/O signatures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[3]  David R. Kaeli,et al.  Profile-guided I/O partitioning , 2003, ICS '03.

[4]  Vagelis Hristidis,et al.  BORG: Block-reORGanization for Self-optimizing Storage Systems , 2009, FAST.

[5]  Robert B. Ross,et al.  Noncontiguous I/O accesses through MPI-IO , 2003, CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings..

[6]  D.A. Reed,et al.  Input/Output Characteristics of Scalable Parallel Applications , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[7]  Robert B. Ross,et al.  Efficient structured data access in parallel file systems , 2003, 2003 Proceedings IEEE International Conference on Cluster Computing.

[8]  Song Jiang,et al.  IOrchestrator: Improving the Performance of Multi-node I/O Systems via Inter-Server Coordination , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Rajeev Thakur,et al.  Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[10]  John Bent,et al.  PLFS: a checkpoint filesystem for parallel applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[11]  Andrew S. Tanenbaum,et al.  Operating systems - design and implementation, 3rd Edition , 2005 .