Cluster-based input/output trace synthesis

I/O traces are crucial for understanding the performance of new storage architectures. Unfortunately, traces are extremely bursty and difficult to characterize. They are large, difficult to obtain, and unwieldy. In this paper, we examine a trace synthesis method based on cluster analysis of time-varying characteristics of I/O traces. Representative trace segments are selected, and synthetic traces are reconstructed from these segments. We show that we can achieve a 5-10% demerit factor for I/O response times with a reduction of trace data volume of 75-90%.

[1]  Jan Beran,et al.  Statistics for long-memory processes , 1994 .

[2]  G. W. Milligan,et al.  A study of standardization of variables in cluster analysis , 1988 .

[3]  Ian T. Jolliffe,et al.  Discarding Variables in a Principal Component Analysis. I: Artificial Data , 1972 .

[4]  John Wilkes The Pantheon storage-system simulator , 1996 .

[5]  J. C. Peters,et al.  Fuzzy Cluster Analysis : A New Method to Predict Future Cardiac Events in Patients With Positive Stress Tests , 1998 .

[6]  Eric R. Ziegel,et al.  Applied Multivariate Data Analysis , 2002, Technometrics.

[7]  Christos Faloutsos,et al.  Data mining meets performance evaluation: fast algorithms for modeling bursty traffic , 2002, Proceedings 18th International Conference on Data Engineering.

[8]  Brian Everitt,et al.  Cluster analysis , 1974 .

[9]  I. Jolliffe Discarding Variables in a Principal Component Analysis. Ii: Real Data , 1973 .

[10]  Giuseppe Serazzi,et al.  A Characterization of the Variation in Time of Workload Arrival Patterns , 1985, IEEE Transactions on Computers.

[11]  Matthias Grossglauser,et al.  On the relevance of long-range dependence in network traffic , 1999, TNET.

[12]  Bo Hong,et al.  The relevance of long-range dependence in disk traffic and implications for trace synthesis , 2005, 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies (MSST'05).

[13]  Gregory R. Ganger,et al.  Generating Representative Synthetic Workloads: An Unsolved Problem , 1995 .

[14]  John Wilkes,et al.  UNIX Disk Access Patterns , 1993, USENIX Winter.

[15]  Barry L. Nelson,et al.  Automatic modeling of file system workloads using two-level arrival processes , 1998, TOMC.

[16]  John Wilkes,et al.  An introduction to disk drive modeling , 1994, Computer.

[17]  Ashok K. Agrawala,et al.  An Approach to the Workload Characterization Problem , 1976, Computer.

[18]  Alex S. Wight,et al.  Cluster Analysis for Characterising Computer System Workloads - Panacea or Pandora? , 1981, Int. CMG Conference.

[19]  Kenneth G. Manton,et al.  Fuzzy Cluster Analysis , 2005 .

[20]  Philip C. Roth,et al.  Real-Time Statistical Clustering for Event Trace Reduction , 1997, Int. J. High Perform. Comput. Appl..

[21]  Paul Meakin,et al.  Fractals, scaling, and growth far from equilibrium , 1998 .

[22]  B. Everitt,et al.  Applied Multivariate Data Analysis. , 1993 .

[23]  B. Everitt,et al.  Applied Multivariate Data Analysis: Everitt/Applied Multivariate Data Analysis , 2001 .