Profile-guided I/O partitioning

In the field of high performance computing there is a growing need to process large, complex datasets. Many of these applications are file-intensive workloads, performing a large number of reads from and writes to a small number of files. When executing these workloads on cluster-based systems, performance cannot scale by simply increasing the number of compute nodes. To effectively exploit parallel resources we need to parallelize file I/O. The potential impact of exploiting parallel I/O grows as the gap between CPU and disk speeds continues to increase.While parallel I/O middleware systems (e.g., MPI I/O) provide users with environments where large datasets can be shared among multiple distributed processes, the performance of file-intensive applications depends heavily on how the data is accessed and where the data is physically located on disk. I/O operations need to be parallelized both at the application level (using middleware) and at the disk level (using partitioning).In this paper, we present a new profile-guided greedy partitioning algorithm to parallelize I/O access for file-intensive applications run on cluster-based systems. We are using MPI and MPI I/O to provide parallelization at the application level. We utilize I/O profiling to capture relevant information about the I/O stream. We then use these profiles to guide file partitioning across multiple disks to significantly improve I/O throughput.

[1]  Y. Wang,et al.  PROFILE-BASED CHARACTERIZATION AND TUNING FOR SUBSURFACE SENSING AND IMAGING APPLICATIONS , 2002 .

[2]  David Kotz Disk-directed I/O for an out-of-core computation , 1995, Proceedings of the Fourth IEEE International Symposium on High Performance Distributed Computing.

[3]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[4]  Mahmut Kandemir,et al.  A Unified Tiling Approach for Out-Of-Core Computations , 1996 .

[5]  Rajeev Thakur,et al.  On implementing MPI-IO portably and with high performance , 1999, IOPADS '99.

[6]  Liana L. Fong,et al.  Performance analysis on a CC-NUMA prototype , 1997, IBM J. Res. Dev..

[7]  James R. Larus,et al.  Cache-conscious structure layout , 1999, PLDI '99.

[8]  Mahmut T. Kandemir,et al.  Data access reorganizations in compiling out-of-core data parallel programs on distributed memory machines , 1997, Proceedings 11th International Parallel Processing Symposium.

[9]  William Gropp,et al.  Users guide for mpich, a portable implementation of MPI , 1996 .

[10]  Mahmut T. Kandemir,et al.  Design and Evaluation of a Compiler-Directed Collective I/O Technique , 2000, Euro-Par.

[11]  Mahmut Kandemir,et al.  Optimizing Out-of-Core Computations in Uniprocessors , 1997 .

[12]  Peter Brezany,et al.  Language and Compiler Support for Out-of-Core Irregular Applications on Distributed-Memory Multiprocessors , 1998, LCR.

[13]  Rajeev Thakur,et al.  Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[14]  Gene Cooperman,et al.  Overcoming the memory wall in symbolic algebra: a faster permutation multiplication , 2002, SIGS.

[15]  Prithviraj Banerjee,et al.  A study of I/O behavior of perfect benchmarks on a multiprocessor , 1990, ISCA '90.

[16]  Daniel A. Reed,et al.  Learning to Classify Parallel Input/Output Access Patterns , 2002, IEEE Trans. Parallel Distributed Syst..

[17]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[18]  Rajeev Thakur,et al.  Users Guide for ROMIO: A High-Performance , 1997 .

[19]  Daniela Genius,et al.  Improving data layout through coloring-directed array merging , 1999 .

[20]  Rajeev Thakur,et al.  Users guide for ROMIO: A high-performance, portable MPI-IO implementation , 1997 .

[21]  Carla Schlatter Ellis,et al.  File-Access Characteristics of Parallel Scientific Workloads , 1996, IEEE Trans. Parallel Distributed Syst..

[22]  Rajeev Thakur,et al.  Compiler and runtime support for out-of-core HPF programs , 1994, ICS '94.

[23]  Evgenia Smirni,et al.  Lessons from Characterizing the Input/Output Behavior of Parallel Scientific Applications , 1998, Perform. Evaluation.

[24]  Daniel A. Reed,et al.  A Comparison of Logical and Physical Parallel I/o pAtterns , 1998, Int. J. High Perform. Comput. Appl..

[25]  Peter M. Chen,et al.  Striping in a RAID level 5 disk array , 1995, SIGMETRICS '95/PERFORMANCE '95.

[26]  Jim Zelenka,et al.  Informed prefetching and caching , 1995, SOSP.

[27]  Todd C. Mowry,et al.  Compiler-based I/O prefetching for out-of-core applications , 2001, TOCS.