An Experimental Evaluation of I/O Optimizations on Different Applications

Many large scale applications have significant I/O requirements as well as computational and memory requirements. Unfortunately, the limited number of I/O nodes provided in a typical configuration of the modern message-passing distributed-memory architectures such as Intel Paragon and IBM SP-2 limits the I/O performance of these applications severely. In this paper, we examine some software optimization techniques and evaluate their effects in five different I/O-intensive codes from both small and large application domains. Our goals in this study are twofold. First, we want to understand the behavior of large-scale data-intensive applications and the impact of I/O subsystems on their performance and vice versa. Second, and more importantly, we strive to determine the solutions for improving the applications’ performance by a mix of software techniques. Our results reveal that different applications can benefit from different optimizations. For example, we found that some applications benefit from file layout optimizations whereas others take advantage of collective I/O. A combination of architectural and software solutions is normally needed to obtain good I/O performance. For example, we show that with a limited number of I/O resources, it is possible to obtain good performance by using appropriate software optimizations. We also show that beyond a certain level, imbalance in the architecture results in performance degradation even when using optimized software, thereby indicating the necessity of an increase in I/O

[1]  Mahmut T. Kandemir,et al.  A unified compiler algorithm for optimizing locality, parallelism and communication in out-of-core computations , 1997, IOPADS '97.

[2]  Andrew A. Chien,et al.  PPFS: a high performance portable parallel file system , 1995, ICS '95.

[3]  Alok Choudhary,et al.  PASSION Runtime Library for the Intel Paragon , 1995 .

[4]  Peter Brezany,et al.  Language, compiler and parallel database support for I/O intensive applications , 1995, HPCN Europe.

[5]  Samuel A. Fineberg Implementing the NHT-1 application I/O benchmark , 1993, CARN.

[6]  Alok N. Choudhary,et al.  Improved parallel I/O via a two-phase run-time access strategy , 1993, CARN.

[7]  Mahmut T. Kandemir,et al.  Improving the performance of out-of-core computations , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).

[8]  Andrew A. Chien,et al.  Algorithmic influences on I/O access patterns and parallel file system performance , 1997, Proceedings 1997 International Conference on Parallel and Distributed Systems.

[9]  Todd C. Mowry,et al.  Automatic compiler-inserted I/O prefetching for out-of-core applications , 1996, OSDI '96.

[10]  Kai Li,et al.  CLIP: A Checkpointing Tool for Message Passing Parallel Programs , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[11]  Sivan Toledo,et al.  The design and implementation of SOLAR, a portable library for scalable out-of-core linear algebra computations , 1996, IOPADS '96.

[12]  A.N. Choudhary,et al.  Optimization and Evaluation of Hartree-Fock Application's I/O with PASSION , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[13]  Ken Kennedy,et al.  Compiler support for out-of-core arrays on parallel machines , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[14]  Mahmut T. Kandemir,et al.  Compilation Techniques for Out-of-Core Parallel Computations , 1998, Parallel Comput..

[15]  Rajeev Thakur,et al.  An Experimental Evaluation of the Parallel I/O Systems of the IBM SP and Intel Paragon Using a Production Application , 1996, ACPC.

[16]  Alok N. Choudhary,et al.  A prefetching prototype for the parallel file systems on the Paragon , 1995, SIGMETRICS '95/PERFORMANCE '95.

[17]  D.A. Reed,et al.  Scalable performance analysis: the Pablo performance analysis environment , 1993, Proceedings of Scalable Parallel Libraries Conference.

[18]  D.A. Reed,et al.  Input/Output Characteristics of Scalable Parallel Applications , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[19]  Dror G. Feitelson,et al.  Parallel File Systems for the IBM SP Computers , 1995, IBM Syst. J..

[20]  Rajeev Thakur,et al.  A Case for Using MPI's Derived Datatypes to Improve I/O Performance , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[21]  Alok Choudhary,et al.  Design and evaluation of optimizations in i/o-intensive applications , 1998 .

[22]  Robert B. Ross,et al.  PVFS: A Parallel File System for Linux Clusters , 2000, Annual Linux Showcase & Conference.

[23]  David Kotz Expanding the potential for disk-directed I/O , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.

[24]  Joel H. Saltz,et al.  Jovian: a framework for optimizing parallel I/O , 1994, Proceedings Scalable Parallel Libraries Conference.

[25]  Rajeev Thakur,et al.  An Extended Two-Phase Method for Accessing Sections of Out-of-Core Arrays , 1996, Sci. Program..

[26]  Mahmut T. Kandemir A Collective I/O Scheme Based on Compiler Analysis , 2000, LCR.

[27]  Dror G. Feitelson,et al.  Overview of the MPI-IO Parallel I/O Interface , 1996, Input/Output in Parallel and Distributed Computer Systems.

[28]  Marianne Winslett,et al.  Server-Directed Collective I/O in Panda , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[29]  Ken Kennedy,et al.  A model and compilation strategy for out-of-core data parallel programs , 1995, PPOPP '95.