Analyzing Real Cluster Data for Formulating Allocation Algorithms in Cloud Platforms

A problem commonly faced in Computer Science research is the lack of real usage data that can be used for the validation of algorithms. This situation is particularly true and crucial in Cloud Computing. The privacy of data managed by commercial Cloud infrastructures, together with their massive scale, make them very uncommon to be available to the research community. Due to their scale, when designing resource allocation algorithms for Cloud infrastructures, many assumptions must be made in order to make the problem tractable. This paper provides deep analysis of a cluster data trace recently released by Google and focuses on a number of questions which have not been addressed in previous studies. In particular, we describe the characteristics of job resource usage in terms of dynamics (how it varies with time), of correlation between jobs (identify daily and/or weekly patterns), and correlation inside jobs between the different resources (dependence of memory usage on CPU usage). From this analysis, we propose a way to formalize the allocation problem on such platforms, which encompasses most job features from the trace with a small set of parameters.

[1]  Alberto Lluch-Lafuente,et al.  A Cooperative Approach for Distributed Task Execution in Autonomic Clouds , 2013, 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[2]  Paul Renaud-Goud,et al.  Approximation algorithms for energy minimization in Cloud service allocation under reliability constraints , 2013, 20th Annual International Conference on High Performance Computing.

[3]  Sangyeun Cho,et al.  Characterizing Machines and Workloads on a Google Cluster , 2012, 2012 41st International Conference on Parallel Processing Workshops.

[4]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[5]  Andrzej Kochut,et al.  Dynamic Placement of Virtual Machines for Managing SLA Violations , 2007, 2007 10th IFIP/IEEE International Symposium on Integrated Network Management.

[6]  Jie Xu,et al.  An Analysis of the Server Characteristics and Resource Utilization in Google Cloud , 2013, 2013 IEEE International Conference on Cloud Engineering (IC2E).

[7]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[8]  Johan Tordsson,et al.  Efficient provisioning of bursty scientific workloads on the cloud using adaptive elasticity control , 2012, ScienceCloud '12.

[9]  Charles Reiss,et al.  Towards understanding heterogeneous clouds at scale : Google trace analysis , 2012 .

[10]  Randy H. Katz,et al.  Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.

[11]  Dorit S. Hochba,et al.  Approximation Algorithms for NP-Hard Problems , 1997, SIGA.

[12]  Sheng Di,et al.  Host load prediction in a Google compute cloud with a Bayesian model , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Franck Cappello,et al.  Optimization of cloud task processing with checkpoint-restart mechanism , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[14]  Franck Cappello,et al.  Characterizing Cloud Applications on a Google Data Center , 2013, 2013 42nd International Conference on Parallel Processing.

[15]  Jerome A. Rolia,et al.  Workload Analysis and Demand Prediction of Enterprise Data Center Applications , 2007, 2007 IEEE 10th International Symposium on Workload Characterization.

[16]  Sheng Di,et al.  Characterization and Comparison of Cloud versus Grid Workloads , 2012, 2012 IEEE International Conference on Cluster Computing.