Predicting the Bursts of Data Access Streams by Filtering Correlated I/Os

Bursty behavior normally indicates that the workload generated by data accesses happens in short time, uneven spurts. In order to handle the bursts, the physical resources of IT devices have to be configured to offer capability which goes far beyond the average resource utilization, thus satisfying the performance. However, this kind of fat provisioning incurs wasting resources when the system does not experience peak workloads. If the bursts can be predicted in advance, thin provision will save a lot of resources in contrast to the fat provision. However, the bursty data access involves both correlated I/Os and non-correlated I/Os which are mixed together. Therefore, it has long been a challenge to effectively predict the bursts. By analyzing real traces, this paper observes that the non-correlated block I/Os dominate bursts across I/O workloads. Based on this observation, SAW-Apriori algorithm is proposed in this paper to mine the frequent and correlated I/Os by enhancing the temporal locality of traditionl Apriori algorithm. Furthermore, this paper proposes to predict the bursts by filtering those frequent and correlated I/Os. Experimental results demonstrate that the proposed approach significantly outperforms the traditional time series method when predicting the bursts.

[1]  Jun Shao,et al.  A Burst Scheduling Access Reordering Mechanism , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[2]  Yuhui Deng,et al.  Conserving disk energy in virtual machine based environments by amplifying bursts , 2010, Computing.

[3]  Jiawei Han,et al.  Frequent Closed Sequence Mining without Candidate Maintenance , 2007, IEEE Transactions on Knowledge and Data Engineering.

[4]  Hao Jiang,et al.  Why is the internet traffic bursty in short time scales? , 2005, SIGMETRICS '05.

[5]  Wenan Zhou,et al.  A Dynamic-Resource-Allocation based flash crowd mitigation algorithm for Video-on-Demand network , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[6]  Antony I. T. Rowstron,et al.  Write off-loading: Practical power management for enterprise storage , 2008, TOS.

[7]  Yuanyuan Zhou,et al.  Association Proceedings of the Third USENIX Conference on File and Storage Technologies San Francisco , CA , USA March 31 – April 2 , 2004 , 2004 .

[8]  HanJiawei,et al.  Frequent Closed Sequence Mining without Candidate Maintenance , 2007 .

[9]  Luiz André Barroso,et al.  The Case for Energy-Proportional Computing , 2007, Computer.

[10]  Murad S. Taqqu,et al.  On the Self-Similar Nature of Ethernet Traffic , 1993, SIGCOMM.

[11]  Walter Willinger,et al.  Analysis, modeling and generation of self-similar VBR video traffic , 1994, SIGCOMM.

[12]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[13]  Haruo Yokota,et al.  Effects on performance and energy reduction by file relocation based on file-access correlations , 2012, EDBT-ICDT '12.

[14]  M.E. Gomez,et al.  Self-similarity in I/O workload: analysis and modeling , 1998, Workload Characterization: Methodology and Case Studies. Based on the First Workshop on Workload Characterization.

[15]  Funda Ergün,et al.  A dynamic lookup scheme for bursty access patterns , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[16]  Azer Bestavros,et al.  Self-similarity in World Wide Web traffic: evidence and possible causes , 1996, SIGMETRICS '96.

[17]  Yuhui Deng,et al.  What is the future of disk drives, death or rebirth? , 2011, ACM Comput. Surv..

[18]  Yuhui Deng,et al.  Self-similarity: Behind workload reshaping and prediction , 2012, Future Gener. Comput. Syst..