An Empirical Exploration of Black-Box Performance Models for Storage Systems

The effectiveness of automatic storage management depends on the accuracy of the storage performance models that are used for making resource allocation decisions. Several approaches have been proposed for modeling. Black-box approaches are the most promising in real-world storage systems because they require minimal device specific information, and are self-evolving with respect to changes in the system. However, blackbox techniques have been traditionally considered inaccurate and non-converging in real-world systems. This paper evaluates a popular off-the-shelf black-box technique for modeling a real-world storage environment. We measured the accuracy of performance predictions in single workload and multiple workload environments. We also analyzed accuracy of different performance metrics namely throughput, latency, and detection of saturation state. By empirically exploring improvements for the model accuracy, we discovered that by limiting the component model training for the nonsaturated zone only and by taking into account the number of outstanding IO requests, the error rate of the throughput model is 4.5% and the latency model is 19.3%. We also discovered that for systems with multiple workloads, it is necessary to consider access characteristics of each workload as input parameters for the model. Lastly, we report results on the sensitivity of model accuracy as a function of the amount of bootstrapping data.

[1]  John Wilkes,et al.  An introduction to disk drive modeling , 1994, Computer.

[2]  Arif Merchant,et al.  A modular, analytical throughput model for modern disk arrays , 2001, MASCOTS 2001, Proceedings Ninth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[3]  Christos Faloutsos,et al.  Storage device performance prediction with CART models , 2004, The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 2004. (MASCOTS 2004). Proceedings..

[4]  Gregory R. Ganger,et al.  The DiskSim Simulation Environment Version 4.0 Reference Manual (CMU-PDL-08-101) , 1998 .

[5]  Eric Anderson,et al.  Proceedings of the Fast 2002 Conference on File and Storage Technologies Hippodrome: Running Circles around Storage Administration , 2022 .

[6]  Gul A. Agha,et al.  CHAMELEON: A Self-Evolving, Fully-Adaptive Resource Arbitrator for Storage Systems , 2005, USENIX Annual Technical Conference, General Track.

[7]  Arif Merchant,et al.  Minerva: An automated resource provisioning tool for large-scale storage systems , 2001, TOCS.

[8]  Arif Merchant,et al.  An analytic behavior model for disk drives with readahead caches and request reordering , 1998, SIGMETRICS '98/PERFORMANCE '98.

[9]  Christos Faloutsos,et al.  Storage device performance prediction with CART models , 2004, The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems, 2004. (MASCOTS 2004). Proceedings..

[10]  August 29-September 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems , 2000, Proceedings 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (Cat. No.PR00728).

[11]  E. Anderson HPL – SSP – 2001 – 4 : Simple table-based modeling of storage devices , 2001 .

[12]  Randy H. Katz,et al.  SMART: An Integrated Multi-Action Advisor for Storage Systems , 2006, USENIX Annual Technical Conference, General Track.

[13]  John Wilkes The Pantheon storage-system simulator , 1996 .

[14]  W. Loh,et al.  REGRESSION TREES WITH UNBIASED VARIABLE SELECTION AND INTERACTION DETECTION , 2002 .