Data layout optimization for petascale file systems

In this study, the authors propose a simple performance model to promote a better integration between the parallel I/O middleware layer and parallel file systems. They show that application-specific data layout optimization can improve overall data access delay considerably for many applications. Implementation results under MPI-IO middleware and PVFS2 file system confirm the correctness and effectiveness of their approach, and demonstrate the potential of data layout optimization in petascale data storage.

[1]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[2]  John May,et al.  Parallel I/O for High Performance Computing , 2000 .

[3]  Lustre : A Scalable , High-Performance File System Cluster , 2003 .

[4]  John Bent,et al.  PLFS: a checkpoint filesystem for parallel applications , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[5]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[6]  Srinivasan Seshan,et al.  Measurement and Analysis of TCP Throughput Collapse in Cluster-based Storage Systems , 2008, FAST.

[7]  D.A. Reed,et al.  Input/Output Characteristics of Scalable Parallel Applications , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[8]  Rajeev Thakur,et al.  Optimizing noncontiguous accesses in MPI-IO , 2002, Parallel Comput..

[9]  Karsten Schwan,et al.  Flexible IO and integration for scientific codes through the adaptable IO system (ADIOS) , 2008, CLADE '08.

[10]  David R. O'Hallaron,et al.  Computer Systems: A Programmer's Perspective , 1991 .

[11]  Rajeev Thakur,et al.  Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.