The relevance of long-range dependence in disk traffic and implications for trace synthesis

Accurate disk workloads are crucial for storage systems design, but I/O traces are difficult to obtain, unwieldy to work with, and unparameterizable. I/O traces are often bursty and difficult to characterize. Although good models of I/O workloads would be extremely useful, such bursty traces cannot accurately be modeled using exponential or Poisson arrival times. Much experimental evidence suggests that I/O traces are self-similar, which researchers have hoped might help to model bursty traces. In this paper, we show that self-similarity at large time scales does not significantly affect disk behavior with respect to response times. This allows us to generate synthetic arrival patterns at relatively small time scales, improving the accuracy of trace generation. The relative error of our method, with input parameters suitable for the workload, ranges from approximately 8% to 12%.

[1]  Claude E. Shannon,et al.  The Mathematical Theory of Communication , 1950 .

[2]  Gregory R. Ganger,et al.  Generating Representative Synthetic Workloads: An Unsolved Problem , 1995 .

[3]  Yanda Li,et al.  The multifractal property of bursty traffic and its parameter estimation based on wavelets , 1997, TENCON '97 Brisbane - Australia. Proceedings of IEEE TENCON '97. IEEE Region 10 Annual Conference. Speech and Image Technologies for Computing and Telecommunications (Cat. No.97CH36162).

[4]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[5]  John Wilkes,et al.  An introduction to disk drive modeling , 1994, Computer.

[6]  Eric A. Brewer,et al.  Self-similarity in file systems , 1998, SIGMETRICS '98/PERFORMANCE '98.

[7]  María Engracia Gómez,et al.  Analysis of self-similarity in I/O workload using structural modeling , 1999, MASCOTS '99. Proceedings of the Seventh International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[8]  Walter Willinger,et al.  On the Self-Similar Nature of Ethernet Traffic ( extended version ) , 1995 .

[9]  Rudolf H. Riedi,et al.  An introduction to multifractals , 1997 .

[10]  Paul Meakin,et al.  Fractals, scaling, and growth far from equilibrium , 1998 .

[11]  Matthias Grossglauser,et al.  On the relevance of long-range dependence in network traffic , 1996, SIGCOMM 1996.

[12]  M.E. Gomez,et al.  Self-similarity in I/O workload: analysis and modeling , 1998, Workload Characterization: Methodology and Case Studies. Based on the First Workshop on Workload Characterization.

[13]  John Wilkes,et al.  UNIX Disk Access Patterns , 1993, USENIX Winter.

[14]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1997, TNET.

[15]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[16]  Jan Beran,et al.  Statistics for long-memory processes , 1994 .

[17]  Joseph L. McCauley Chaos, dynamics, and fractals: Introduction to multifractals , 1993 .

[18]  John Wilkes The Pantheon storage-system simulator , 1996 .

[19]  Christos Faloutsos,et al.  Capturing the spatio-temporal behavior of real traffic data , 2002, Perform. Evaluation.

[20]  Arnold L. Neidhardt,et al.  The concept of relevant time scales and its application to queuing analysis of self-similar traffic (or is Hurst naughty or nice?) , 1998, SIGMETRICS '98/PERFORMANCE '98.

[21]  John Mellor,et al.  On burstiness of self-similar traffic models , 1996 .

[22]  Walter Willinger,et al.  On the self-similar nature of Ethernet traffic , 1993, SIGCOMM '93.

[23]  Christos Faloutsos,et al.  Data mining meets performance evaluation: fast algorithms for modeling bursty traffic , 2002, Proceedings 18th International Conference on Data Engineering.