Analyzing real cluster data for formulating allocation algorithms in cloud platforms

We analyze a large cluster trace released by Google.We provide information about static and dynamic features of dominant jobs.We show that memory usage of tasks is independent of CPU usage for most jobs.We analyze the independence of machine failures.Based on this analysis, we propose several algorithmic formulations for allocation problems. A problem commonly faced in Computer Science research is the lack of real usage data that can be used for the validation of algorithms. This situation is particularly true and crucial in Cloud Computing. The privacy of data managed by commercial Cloud infrastructures, together with their massive scale, makes them very uncommon to be available to the research community. Due to their scale, when designing resource allocation algorithms for Cloud infrastructures, many assumptions must be made in order to make the problem tractable.This paper provides deep analysis of a cluster data trace recently released by Google and focuses on a number of questions which have not been addressed in previous studies. In particular, we describe the characteristics of job resource usage in terms of dynamics (how it varies with time), of correlation between jobs (identify daily and/or weekly patterns), and correlation inside jobs between the different resources (dependence of memory usage on CPU usage). From this analysis, we propose a way to formalize the allocation problem on such platforms, which encompasses most job features from the trace with a small set of parameters.

[1]  Alberto Lluch-Lafuente,et al.  A Cooperative Approach for Distributed Task Execution in Autonomic Clouds , 2013, 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[2]  Sangyeun Cho,et al.  Characterizing Machines and Workloads on a Google Cluster , 2012, 2012 41st International Conference on Parallel Processing Workshops.

[3]  Johan Tordsson,et al.  Efficient provisioning of bursty scientific workloads on the cloud using adaptive elasticity control , 2012, ScienceCloud '12.

[4]  Paul Renaud-Goud,et al.  Efficient and robust allocation algorithms in clouds under memory constraints , 2013, 2014 21st International Conference on High Performance Computing (HiPC).

[5]  Andrzej Kochut,et al.  Dynamic Placement of Virtual Machines for Managing SLA Violations , 2007, 2007 10th IFIP/IEEE International Symposium on Integrated Network Management.

[6]  Jerome A. Rolia,et al.  Workload Analysis and Demand Prediction of Enterprise Data Center Applications , 2007, 2007 IEEE 10th International Symposium on Workload Characterization.

[7]  Sheng Di,et al.  Characterization and Comparison of Cloud versus Grid Workloads , 2012, 2012 IEEE International Conference on Cluster Computing.

[8]  Jie Xu,et al.  An Analysis of the Server Characteristics and Resource Utilization in Google Cloud , 2013, 2013 IEEE International Conference on Cloud Engineering (IC2E).

[9]  Paul Renaud-Goud,et al.  Approximation algorithms for energy minimization in Cloud service allocation under reliability constraints , 2013, 20th Annual International Conference on High Performance Computing.

[10]  Sheng Di,et al.  Host load prediction in a Google compute cloud with a Bayesian model , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  Franck Cappello,et al.  Optimization of cloud task processing with checkpoint-restart mechanism , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[12]  Randy H. Katz,et al.  Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.

[13]  Franck Cappello,et al.  Characterizing Cloud Applications on a Google Data Center , 2013, 2013 42nd International Conference on Parallel Processing.