Using Structural Similarity to Predict Future Workload Behavior in the Cloud

Predicting workload behavior, in response to changes in allocated resources, is a critical part of effective resource management in the cloud. This paper presents a novel approach to predicting the future behavior of a reference workload based on a nearest neighbor similarity search in euclidean space. The proposed approach involves identifying a similar workload candidate, predicting the future behavior of the reference based on the candidate and then validating the prediction using a statistical hypothesis test. Finally, decision rules are generated that specify the required conditions for a successful prediction.

[1]  Zbigniew R. Struzik,et al.  The Haar Wavelet Transform in the Time Series Similarity Paradigm , 1999, PKDD.

[2]  Hui Li,et al.  Workload dynamics on clusters and grids , 2008, The Journal of Supercomputing.

[3]  Thomas Fahringer,et al.  A similarity measure for time, frequency, and dependencies in large-scale workloads , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[4]  Nitesh V. Chawla,et al.  A Minimum-Cost Flow Model for Workload Optimization on Cloud Infrastructure , 2017, 2017 IEEE 10th International Conference on Cloud Computing (CLOUD).

[5]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[6]  Kevin Lee,et al.  Empirical prediction models for adaptive resource provisioning in the cloud , 2012, Future Gener. Comput. Syst..

[7]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[8]  S. Kullback,et al.  Information Theory and Statistics , 1959 .

[9]  E. Bautista-Thompson,et al.  Shape Similarity Index for Time Series based on Features of Euclidean Distances Histograms , 2006, 2006 15th International Conference on Computing.

[10]  Karen L. Karavanic,et al.  Evaluating similarity-based trace reduction techniques for scalable performance analysis , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[11]  Jie Xu,et al.  An Analysis of the Server Characteristics and Resource Utilization in Google Cloud , 2013, 2013 IEEE International Conference on Cloud Engineering (IC2E).

[12]  Nitesh V. Chawla,et al.  Statistical Analysis and Modeling of Heterogeneous Workloads on Amazon's Public Cloud Infrastructure , 2019, HICSS.

[13]  Igor Vajda,et al.  On Divergences and Informations in Statistics and Information Theory , 2006, IEEE Transactions on Information Theory.

[14]  Jean-François Coeurjolly,et al.  Normalized information-based divergences , 2007, Probl. Inf. Transm..

[15]  Thomas Fahringer,et al.  Identification, Modelling and Prediction of Non-periodic Bursts in Workloads , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[16]  Farokh B. Bastani,et al.  Integrating Clustering and Learning for Improved Workload Prediction in the Cloud , 2016, 2016 IEEE 9th International Conference on Cloud Computing (CLOUD).

[17]  Kevin Lee,et al.  Event Aware Workload Prediction: A Study Using Auction Events , 2012, WISE.

[18]  Aniruddha S. Gokhale,et al.  Efficient Autoscaling in the Cloud Using Predictive Models for Workload Forecasting , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[19]  Li-Der Chou,et al.  A novel VM workload prediction using Grey Forecasting model in cloud data center , 2014, The International Conference on Information Networking 2014 (ICOIN2014).

[20]  E. Hellinger,et al.  Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen. , 1909 .

[21]  Eamonn J. Keogh,et al.  Scaling up dynamic time warping for datamining applications , 2000, KDD '00.

[22]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[23]  Imre Csiszár,et al.  Information Theory and Statistics: A Tutorial , 2004, Found. Trends Commun. Inf. Theory.

[24]  T. Morimoto Markov Processes and the H -Theorem , 1963 .

[25]  Frank Nielsen,et al.  On the chi square and higher-order chi distances for approximating f-divergences , 2013, IEEE Signal Processing Letters.

[26]  Adam Gold,et al.  Understanding the Mann-Whitney test , 2007 .

[27]  Renée J. Miller,et al.  Similarity search over time-series data using wavelets , 2002, Proceedings 18th International Conference on Data Engineering.

[28]  D. Brown,et al.  Models in Biology: Mathematics, Statistics and Computing. , 1995 .