I/O characterization on a parallel file system

In this paper we present a study of I/O access patterns of scientific and general applications on a parallel file system. Understanding I/O access patterns is an essential condition to effectively designing a file system. Supercomputing applications running on these parallel systems make extensive use of parallel file systems taking advantage of faster data access by requesting information from multiple nodes simultaneously. However, parallel file systems can become a bottleneck if the file distribution parameters do not fit the access scheme of the applications. In our work, we examine a variety of such applications, providing measurement of inter-arrival times, I/O request size and burstiness demanded from a parallel file system. Our tests were conducted on the open source PVFS parallel file system with different configurations of metadata servers and I/O nodes. Among the findings are that the standard assumption of Poisson or random interarrival times is not justified and that access sizes are smaller than would be expected for a parallel application.

[1]  Frank B. Schmuck,et al.  GPFS: A Shared-Disk File System for Large Computing Clusters , 2002, FAST.

[2]  Prithviraj Banerjee,et al.  A study of I/O behavior of perfect benchmarks on a multiprocessor , 1990, ISCA '90.

[3]  Jim Zelenka,et al.  A cost-effective, high-bandwidth storage architecture , 1998, ASPLOS VIII.

[4]  D.A. Reed,et al.  Input/Output Characteristics of Scalable Parallel Applications , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[5]  Jeffrey S. Vetter,et al.  Performance characterization and optimization of parallel I/O on the Cray XT , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[6]  Sadaf R. Alam,et al.  Characterization of Scientific Workloads on Systems with Multi-Core Processors , 2006, 2006 IEEE International Symposium on Workload Characterization.

[7]  Dror G. Feitelson,et al.  Parallel File Systems for the IBM SP Computers , 1995, IBM Syst. J..

[8]  Irfan Ahmad Easy and Efficient Disk I/O Workload Characterization in VMware ESX Server , 2007, 2007 IEEE 10th International Symposium on Workload Characterization.

[9]  Andrew A. Chien,et al.  I/O requirements of scientific applications: an evolutionary view , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[10]  Laxmikant V. Kalé,et al.  CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.

[11]  A. Mericas,et al.  Workload characterization for the design of future servers , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..

[12]  Sandra Johnson Baylor,et al.  Parallel I/O Workload Characteristics Using Vesta , 1996, Input/Output in Parallel and Distributed Computer Systems.

[13]  Rajeev Thakur,et al.  Data sieving and collective I/O in ROMIO , 1998, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[14]  Rob VanderWijngaart,et al.  NAS Parallel Benchmarks I/O Version 2.4. 2.4 , 2002 .

[15]  Rajeev Thakur,et al.  Users guide for ROMIO: A high-performance, portable MPI-IO implementation , 1997 .

[16]  R. Hilgers,et al.  Parameter , 2019, Springer Reference Medizin.

[17]  Dror G. Feitelson,et al.  The Vesta parallel file system , 1996, TOCS.

[18]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[19]  Martin Arlitt,et al.  A workload characterization study of the 1998 World Cup Web site , 2000, IEEE Netw..

[20]  Tao Yang,et al.  The Panasas ActiveScale Storage Cluster - Delivering Scalable High Bandwidth Storage , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[21]  Lustre : A Scalable , High-Performance File System Cluster , 2003 .

[22]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[23]  A. L. Narasimha Reddy,et al.  An Evaluation of Multiple-Disk I/O Systems , 1989, IEEE Trans. Computers.

[24]  Thomas Ludwig,et al.  Performance Evaluation of the PVFS2 Architecture , 2007, 15th EUROMICRO International Conference on Parallel, Distributed and Network-Based Processing (PDP'07).

[25]  Carla Schlatter Ellis,et al.  File-Access Characteristics of Parallel Scientific Workloads , 1996, IEEE Trans. Parallel Distributed Syst..

[26]  Randy H. Katz,et al.  Input/output behavior of supercomputing applications , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[27]  Andrew R. Cherenson,et al.  The Sprite network operating system , 1988, Computer.

[28]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach (4. ed.) , 2007 .

[29]  GhemawatSanjay,et al.  The Google file system , 2003 .

[30]  Martin Arlitt,et al.  Workload Characterization of the 1998 World Cup Web Site , 1999 .

[31]  B. Fryxell,et al.  FLASH: An Adaptive Mesh Hydrodynamics Code for Modeling Astrophysical Thermonuclear Flashes , 2000 .

[32]  David Kotz,et al.  Dynamic file-access characteristics of a production parallel scientific workload , 1994, Proceedings of Supercomputing '94.