Synthesizing Representative I/O Workloads for TPC-H

Synthesizing I/O requests that can accurately capture workload behavior is extremely valuable for the design, implementation and optimization of disk subsystems. This paper presents a synthetic workload generator for TPC-H, an important decision-support commercial workload, by completely characterizing the arrival and access patterns of its queries. We present a novel approach for parameterizing the behavior of inter-mingling streams of sequential requests, and exploit correlations between multiple attributes of these requests, to generate disk block-level traces that are shown to accurately mimic the behavior of a real trace in terms of response time characteristics for each TPC-H query.

[1]  Kimberly Keeton,et al.  Synthesizing representative I/O workloads using iterative distillation , 2003, 11th IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer Telecommunications Systems, 2003. MASCOTS 2003..

[2]  Alan Jay Smith,et al.  Analysis of the Characteristics of Production Database Workloads and Comparison with the TPC Benchmarks , 1999 .

[3]  Allen B. Downey,et al.  The elusive goal of workload characterization , 1999, PERV.

[4]  Daniel A. Reed,et al.  Input/output access pattern classification using hidden Markov models , 1997, IOPADS '97.

[5]  Carla Schlatter Ellis,et al.  Practical prefetching techniques for parallel file systems , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[6]  Gregory R. Ganger,et al.  Generating Representative Synthetic Workloads: An Unsolved Problem , 1995 .

[7]  Peter A. Dinda,et al.  An Extensible Toolkit for Resource Prediction In Distributed Systems , 1999 .

[8]  Kimberly Keeton,et al.  Characterizing I/O-intensive Workload Sequentiality on Modern Disk Arrays , 2001 .

[9]  Gregory R. Ganger,et al.  The DiskSim Simulation Environment Version 4.0 Reference Manual (CMU-PDL-08-101) , 1998 .

[10]  Daniel A. Reed,et al.  Markov model prediction of I/O requests for scientific applications , 2002, ICS '02.

[11]  María Engracia Gómez,et al.  A new approach in the modeling and generation of synthetic disk workload , 2000, Proceedings 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (Cat. No.PR00728).

[12]  María Engracia Gómez,et al.  Analysis of self-similarity in I/O workload using structural modeling , 1999, MASCOTS '99. Proceedings of the Seventh International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[13]  Ren Asmussen,et al.  Fitting Phase-type Distributions via the EM Algorithm , 1996 .

[14]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[15]  Anand Sivasubramaniam,et al.  Decision-support workload characteristics on a clustered database server from the OS perspective , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[16]  Peter G. Harrison,et al.  Modelling techniques and tools for computer performance evaluation , 2003, Perform. Evaluation.

[17]  Alan Jay Smith,et al.  Characteristics of production database workloads and the TPC benchmarks , 2001, IBM Syst. J..

[18]  John Wilkes,et al.  An introduction to disk drive modeling , 1994, Computer.

[19]  Qing Yang,et al.  RAPID-Cache-A Reliable and Inexpensive Write Cache for High Performance Storage Systems , 2002, IEEE Trans. Parallel Distributed Syst..

[20]  Evgenia Smirni,et al.  Workload Characterization of Input/Output Intensive Parallel Applications , 1997, Computer Performance Evaluation.

[21]  Daniel A. Reed,et al.  Automatic arima time series modeling and forecasting for adaptive input/output prefetching , 2002 .

[22]  Anastasia Ailamaki,et al.  Lachesis: Robust Database Storage Management Based on Device-specific Performance Characteristics , 2003, VLDB.

[23]  Don DeSota Characterization of I/O for TPC-C and TPC-H workloads , 2001 .