Quantitative workload analysis and prediction using Google cluster traces

Resource allocation efficiency and energy consumption are among the top concerns to today's Cloud data center. Finding the optimal point where users' multiple job requests can be accomplished timely with minimum electricity and hardware cost is one of the key factors for system designers and managers to optimize the system configurations. Understanding the characteristics of the distribution of user task is an essential step for this purpose. At large-scale Cloud Computing data centers, a precise workload prediction will significantly help designers and operators to schedule hardware/software resources and power supplies in a more efficient manner, and make appropriate decisions to upgrade the Cloud system when the workload grows. While a lot of study has been conducted for hypervisor-based Cloud, container-based virtualization is becoming popular because of the low overhead and high efficiency in utilizing computing resources. In this paper, we have studied a set of real-world container data center traces from part of Google's cluster. We investigated the distribution of job duration, waiting time and machine utilization and the number of jobs submitted in a fix time period. Based on the quantitative study, an Ensemble Workload Prediction (EnWoP) method and a novel prediction evaluation parameter called Cloud Workload Correction Rate (C-Rate) have been proposed. The experimental results have verified that the EnWoP method achieved high prediction accuracy and the C-Rate evaluates the prediction methods more objective.

[1]  Claus Pahl,et al.  Containerization and the PaaS Cloud , 2015, IEEE Cloud Computing.

[2]  Tao Li,et al.  Cloud Analytics for Capacity Planning and Instant VM Provisioning , 2013, IEEE Transactions on Network and Service Management.

[3]  Jie Xu,et al.  An Analysis of the Server Characteristics and Resource Utilization in Google Cloud , 2013, 2013 IEEE International Conference on Cloud Engineering (IC2E).

[4]  Yu Chen,et al.  A fair multi-party non-repudiation scheme for storage clouds , 2011, 2011 International Conference on Collaboration Technologies and Systems (CTS).

[5]  J. Osborne Prediction in Multiple Regression , 2000 .

[6]  Andrew L. Rukhin,et al.  Analysis of Time Series Structure SSA and Related Techniques , 2002, Technometrics.

[7]  Rashedur M. Rahman,et al.  Task shape classification and workload characterization of google cluster trace , 2014, 2014 IEEE International Advance Computing Conference (IACC).

[8]  V. Moskvina,et al.  An Algorithm Based on Singular Spectrum Analysis for Change-Point Detection , 2003 .

[9]  Sangyeun Cho,et al.  Characterizing Machines and Workloads on a Google Cluster , 2012, 2012 41st International Conference on Parallel Processing Workshops.

[10]  Sheng Di,et al.  Characterization and Comparison of Cloud versus Grid Workloads , 2012, 2012 IEEE International Conference on Cluster Computing.

[11]  Kishan G. Mehrotra,et al.  Forecasting the behavior of multivariate time series using neural networks , 1992, Neural Networks.

[12]  Randy H. Katz,et al.  Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.

[13]  Xin Chen,et al.  Failure Analysis of Jobs in Compute Clouds: A Google Cluster Case Study , 2014, 2014 IEEE 25th International Symposium on Software Reliability Engineering.

[14]  Jason W. Osborne,et al.  Practical Assessment, Research, and Evaluation Practical Assessment, Research, and Evaluation Advantages of Hierarchical Linear Modeling Advantages of Hierarchical Linear Modeling , 2022 .

[15]  Gavriel Salvendy,et al.  Handbook of industrial engineering : technology and operations management , 2001 .

[16]  Genshe Chen,et al.  Information fusion in a cloud computing era: A systems-level perspective , 2014, IEEE Aerospace and Electronic Systems Magazine.

[17]  Genshe Chen,et al.  A container-based elastic cloud architecture for real-time full-motion video (FMV) target tracking , 2014, 2014 IEEE Applied Imagery Pattern Recognition Workshop (AIPR).

[18]  Genshe Chen,et al.  An adaptive process-based cloud infrastructure for space situational awareness applications , 2014, Defense + Security Symposium.

[19]  Gavriel Salvendy,et al.  Handbook of industrial engineering , 2001 .

[20]  Zhanpeng Jin,et al.  Telemedicine in the Cloud Era: Prospects and Challenges , 2015, IEEE Pervasive Computing.