A novel model for synthesizing parallel I/O workloads in scientific applications

One of the challenging issues in performance evaluation of parallel storage systems through synthetic-trace-driven simulation is to accurately characterize the I/O demands of data-intensive scientific applications. This paper analyzes several I/O traces collected from different distributed systems and concludes that correlations in parallel I/O inter-arrival times are inconsistent, either with little correlation or with evident and abundant correlations. Thus conventional Poisson or Markov arrival processes are inappropriate to model I/O arrivals in some applications. Instead, a new and generic model based on the alpha-stable process is proposed and validated in this paper to accurately model parallel I/O burstiness in both workloads with little and strong correlations. This model can be used to generate reliable synthetic I/O sequences in simulation studies. Experimental results presented in this paper show that this model can capture the complex I/O behaviors of real storage systems more accurately and faithfully than conventional models, particularly for the burstiness characteristics in the parallel I/O workloads.

[1]  Anand Sivasubramaniam,et al.  Synthesizing Representative I/O Workloads for TPC-H , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[2]  María Engracia Gómez,et al.  Analysis of self-similarity in I/O workload using structural modeling , 1999, MASCOTS '99. Proceedings of the Seventh International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[3]  Randy H. Katz,et al.  Input/output behavior of supercomputing applications , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[4]  J. Rosínski,et al.  Structure of stationary stable processes , 1995 .

[5]  Evgenia Smirni,et al.  Workload Characterization of Input/Output Intensive Parallel Applications , 1997, Computer Performance Evaluation.

[6]  Paulo Henrique Portela de Carvalho,et al.  A traffic characterization procedure for multimedia applications in converged networks , 2005, 13th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems.

[7]  Hong Jiang,et al.  PRO: A Popularity-based Multi-threaded Reconstruction Optimization for RAID-Structured Storage Systems , 2007, FAST.

[8]  Christos Faloutsos,et al.  Data mining meets performance evaluation: fast algorithms for modeling bursty traffic , 2002, Proceedings 18th International Conference on Data Engineering.

[9]  Daniel A. Reed,et al.  Automatic arima time series modeling and forecasting for adaptive input/output prefetching , 2002 .

[10]  Kimberly Keeton,et al.  Synthesizing representative I/O workloads using iterative distillation , 2003, 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer Telecommunications Systems, 2003. MASCOTS 2003..

[11]  Daniel A. Reed,et al.  Automatic ARIMA time series modeling for adaptive I/O prefetching , 2004, IEEE Transactions on Parallel and Distributed Systems.

[12]  竹中 茂夫 G.Samorodnitsky,M.S.Taqqu:Stable non-Gaussian Random Processes--Stochastic Models with Infinite Variance , 1996 .

[13]  Stamatis Cambanis,et al.  Stable mixed moving averages , 1993 .

[14]  M. Taqqu,et al.  Stable Non-Gaussian Random Processes : Stochastic Models with Infinite Variance , 1995 .

[15]  Walter Willinger,et al.  Analysis, modeling and generation of self-similar VBR video traffic , 1994, SIGCOMM.

[16]  Antonio Pescapè,et al.  Worm Traffic Analysis and Characterization , 2007, 2007 IEEE International Conference on Communications.

[17]  Walter Willinger,et al.  On the self-similar nature of Ethernet traffic , 1993, SIGCOMM '93.

[18]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[19]  Sandra Johnson Baylor,et al.  Parallel I/O Workload Characteristics Using Vesta , 1996, Input/Output in Parallel and Distributed Computer Systems.

[20]  Sadaf R. Alam,et al.  An Analysis of System Balance Requirements for Scientific Applications , 2006, 2006 International Conference on Parallel Processing (ICPP'06).

[21]  Christos Faloutsos,et al.  Capturing the spatio-temporal behavior of real traffic data , 2002, Perform. Evaluation.

[22]  Bill Anderson,et al.  Mass storage system performance prediction using a trace-driven simulator , 2005, 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST'05).

[23]  Feng Wang,et al.  File System Workload Analysis For Large Scale Scientific Com puting Applications , 2004 .

[24]  Daniel A. Reed,et al.  Markov model prediction of I/O requests for scientific applications , 2002, ICS '02.

[25]  Bo Hong,et al.  The relevance of long-range dependence in disk traffic and implications for trace synthesis , 2005, 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST'05).