Predicting disk I/O time of HPC applications on flash drives

As the gap between the speed of computing elements and the disk subsystem widens it becomes increasingly important to understand and model disk I/O. While the speed of computational resources continues to grow, potentially scaling to multiple peta flops and millions of cores, the growth in the performance of I/O systems lags well behind. In this context, data-intensive applications that run on current and future systems depend on the ability of the I/O system to move data to the distributed memories. As a result, the I/O system becomes a bottleneck for application performance. Additionally, due to the higher risk of component failure that results from larger scales, the frequency of application checkpointing is expected to grow and put an additional burden on the disk I/O system [1]. Emergence of new technologies such as flash-based Solid State Drives (SSDs) presents an opportunity to narrow the gap between speed of computing and I/O systems. With this in mind, SDSC's PMAC lab is investigating the use of flash drives in a new prototype system called DASH [8, 9, 13]. In this paper we apply and extend a modeling methodology developed for spinning disk and use it to model disk I/O time on DASH. We studied two data-intensive applications, MADbench2 [6] and an application for geological imaging [5]. Our results show that the prediction errors for total I/O time are 14.79% for MADbench2 and our efforts for geological imaging yield error of 9% for one category of read calls; this application made a total of 3 categories of read/write. We are still investigating the geological application, and in this paper we present our results thus far for both applications.

[1]  Leonid Oliker,et al.  HPC global file system performance analysis using a scientific-application derived benchmark , 2009, Parallel Comput..

[2]  Julian Borrill MADCAP - The Microwave Anisotropy Dataset Computational Analysis Package , 1999 .

[3]  J. May Pianola: A script-based I/O benchmark , 2008, 2008 3rd Petascale Data Storage Workshop.

[4]  Jeffrey Bennett,et al.  DASH-IO: an empirical study of flash-based IO for HPC , 2010 .

[5]  John Shalf,et al.  Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[6]  J. Wolfe,et al.  Acoustic wavefront imaging , 1995 .

[7]  Michael L. Norman,et al.  Accelerating data-intensive science with Gordon and Dash , 2010 .

[8]  Sandeep K. S. Gupta,et al.  DASH: a Recipe for a Flash-based Data Intensive Supercomputer , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Michael A. Laurenzano,et al.  Modeling and Predicting Disk I/O Time of HPC Applications , 2010, 2010 DoD High Performance Computing Modernization Program Users Group Conference.

[10]  Michael Laurenzano,et al.  PEBIL: Efficient static binary instrumentation for Linux , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[11]  David R. O'Hallaron,et al.  //TRACE: Parallel Trace Replay with Approximate Causal Events , 2007, FAST.

[12]  Seetharami R. Seelam,et al.  Modeling the Impact of Checkpoints on Next-Generation Systems , 2007, 24th IEEE Conference on Mass Storage Systems and Technologies (MSST 2007).