Characterizing Cloud Applications on a Google Data Center

In this paper, we characterize Google applications, based on a one-month Google trace with over 650k jobs running across over 12000 heterogeneous hosts from a Google data center. On one hand, we carefully compute the valuable statistics about task events and resource utilization for Google applications, based on various types of resources (such as CPU, memory) and execution types (e.g., whether they can run batch tasks or not). Resource utilization per application is observed with an extremely typical Pareto principle. On the other hand, we classify applications via a K-means clustering algorithm with optimized number of sets, based on task events and resource usage. The number of applications in the K-means clustering sets follows a Pareto-similar distribution. We believe our work is very interesting and valuable for the further investigation of Cloud environment.

[1]  Henri Casanova,et al.  Virtual Machine Resource Allocation for Service Hosting on Heterogeneous Distributed Platforms , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[2]  Chita R. Das,et al.  Modeling and synthesizing task placement constraints in Google compute clusters , 2011, SoCC.

[3]  Charles Reiss,et al.  Towards understanding heterogeneous clouds at scale : Google trace analysis , 2012 .

[4]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[5]  Raouf Boutaba,et al.  Characterizing Task Usage Shapes in Google Compute Clusters , 2011 .

[6]  Greg Hamerly,et al.  Alternatives to the k-means algorithm that find better clusterings , 2002, CIKM '02.

[7]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[8]  Richard Koch,et al.  The 80/20 Principle: The Secret of Achieving More With Less , 1998 .

[9]  Sangyeun Cho,et al.  Characterizing Machines and Workloads on a Google Cluster , 2012, 2012 41st International Conference on Parallel Processing Workshops.

[10]  Chita R. Das,et al.  Towards characterizing cloud backend workloads: insights from Google compute clusters , 2010, PERV.

[11]  Rajkumar Buyya,et al.  CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms , 2011, Softw. Pract. Exp..

[12]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[13]  Cho-Li Wang,et al.  Dynamic Optimization of Multiattribute Resource Allocation in Self-Organizing Clouds , 2013, IEEE Transactions on Parallel and Distributed Systems.

[14]  Sheng Di,et al.  Characterization and Comparison of Cloud versus Grid Workloads , 2012, 2012 IEEE International Conference on Cluster Computing.

[15]  Dror G. Feitelson,et al.  Workload Modeling for Computer Systems Performance Evaluation , 2015 .

[16]  Rajkumar Buyya,et al.  InterCloud: Utility-Oriented Federation of Cloud Computing Environments for Scaling of Application Services , 2010, ICA3PP.