Integrating clustering and regression for workload estimation in the cloud

Workload prediction has been widely researched in the literature. However, existing techniques are per‐job based and useful for service‐like tasks whose workloads exhibit seasonality and trend. But cloud jobs have many different workload patterns and some do not exhibit recurring workload patterns. We consider job‐pool‐based workload estimation, which analyzes the characteristics of existing tasks' workloads to estimate the currently running tasks' workload. First cluster existing tasks based on their workloads. For a new task J, collect the initial workload of J and determine which cluster J may belong to, then use the cluster's characteristics to estimate J′s workload. Based on the Google dataset, the algorithm is experimentally evaluated and its effectiveness is confirmed. However, the workload patterns of some tasks do have seasonality and trend, and conventional per‐job‐based regression methods may yield better workload prediction results. Also, in some cases, some new tasks may not follow the workload patterns of existing tasks in the pool. Thus, develop an integrated scheme which combines clustering and regression and utilize the best of them for workload prediction. Experimental study shows that the combined approach can further improve the accuracy of workload prediction.

[1]  Jie Xu,et al.  An Approach for Characterizing Workloads in Google Cloud to Derive Realistic Resource Utilization Models , 2013, 2013 IEEE Seventh International Symposium on Service-Oriented System Engineering.

[2]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Mohsen Guizani,et al.  Energy-Efficient Resource Allocation and Provisioning Framework for Cloud Data Centers , 2015, IEEE Transactions on Network and Service Management.

[4]  Farokh B. Bastani,et al.  Improving the Smartness of Cloud Management via Machine Learning Based Workload Prediction , 2018, 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC).

[5]  Rob J Hyndman,et al.  Automatic Time Series Forecasting: The forecast Package for R , 2008 .

[6]  Jie Xu,et al.  An Analysis of the Server Characteristics and Resource Utilization in Google Cloud , 2013, 2013 IEEE International Conference on Cloud Engineering (IC2E).

[7]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[8]  Guangwen Yang,et al.  Load prediction using hybrid model for computational grid , 2007, 2007 8th IEEE/ACM International Conference on Grid Computing.

[9]  Farokh B. Bastani,et al.  Leveraging Service Clouds for Power and QoS Management for Mobile Devices , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[10]  Eddy Caron,et al.  Forecasting for Grid and Cloud Computing On-Demand Resources Based on Pattern Matching , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[11]  Franck Cappello,et al.  Characterizing and modeling cloud applications/jobs on a Google data center , 2014, The Journal of Supercomputing.

[12]  Xiujuan Lei,et al.  Prediction of miRNA-circRNA associations based on k-NN multi-label with random walk restart on a heterogeneous network , 2019, Big Data Min. Anal..

[13]  Stan Salvador,et al.  FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space , 2004 .

[14]  Andrzej Kochut,et al.  Dynamic Placement of Virtual Machines for Managing SLA Violations , 2007, 2007 10th IFIP/IEEE International Symposium on Integrated Network Management.

[15]  Emmanouel A. Varvarigos,et al.  Adjusted fair scheduling and non-linear workload prediction for QoS guarantees in grid computing , 2007, Comput. Commun..

[16]  Christian Hennig,et al.  Recovering the number of clusters in data sets with noise features using feature rescaling factors , 2015, Inf. Sci..

[17]  张毅,et al.  Multi-Dimensional Traffic Flow Time Series Analysis with Self-Organizing Maps , 2008 .

[18]  Minseok Kwon,et al.  Prediction-based virtual instance migration for balanced workload in the cloud datacenters , 2011 .

[19]  Farokh B. Bastani,et al.  Workload Estimation for Improving Resource Management Decisions in the Cloud , 2015, 2015 IEEE Twelfth International Symposium on Autonomous Decentralized Systems.

[20]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[21]  Farokh B. Bastani,et al.  Integrating Clustering and Learning for Improved Workload Prediction in the Cloud , 2016, 2016 IEEE 9th International Conference on Cloud Computing (CLOUD).

[22]  Farokh B. Bastani,et al.  Secure, Dependable, and High Performance Cloud Storage , 2010, 2010 29th IEEE Symposium on Reliable Distributed Systems.

[23]  André Brinkmann,et al.  Autonomic Resource Management with Support Vector Machines , 2011, 2011 IEEE/ACM 12th International Conference on Grid Computing.

[24]  Archana Ganapathi,et al.  Analysis and Lessons from a Publicly Available Google Cluster Trace , 2010 .

[25]  Jerome A. Rolia,et al.  Workload Analysis and Demand Prediction of Enterprise Data Center Applications , 2007, 2007 IEEE 10th International Symposium on Workload Characterization.

[26]  Konstantinos Dolkas,et al.  e-Business applications on the Grid: a toolkit for centralized workload prediction and access , 2007, Concurr. Comput. Pract. Exp..

[27]  Chang-Tsun Li,et al.  Developing a pattern discovery method in time series data and its GPU acceleration , 2018, Big Data Min. Anal..

[28]  Aniruddha S. Gokhale,et al.  Efficient Autoscaling in the Cloud Using Predictive Models for Workload Forecasting , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[29]  Li-Der Chou,et al.  A novel VM workload prediction using Grey Forecasting model in cloud data center , 2014, The International Conference on Information Networking 2014 (ICOIN2014).