Statistical Analysis and Modeling of Heterogeneous Workloads on Amazon's Public Cloud Infrastructure

Workload modeling in public cloud environments is challenging due to reasons such as infrastructure abstraction, workload heterogeneity and a lack of defined metrics for performance modeling. This paper presents an approach that applies statistical methods for distribution analysis, parameter estimation and Goodness-of-Fit (GoF) tests to develop theoretical (estimated) models of heterogeneous workloads on Amazon’s public cloud infrastructure using compute, memory and IO resource utilization data.

[1]  Raouf Boutaba,et al.  Characterizing Task Usage Shapes in Google Compute Clusters , 2011 .

[2]  Chita R. Das,et al.  Towards characterizing cloud backend workloads: insights from Google compute clusters , 2010, PERV.

[3]  Rajkumar Buyya,et al.  Workload modeling for resource usage analysis and simulation in cloud computing , 2015, Comput. Electr. Eng..

[4]  Jie Xu,et al.  An Empirical Failure-Analysis of a Large-Scale Cloud Computing Environment , 2014, 2014 IEEE 15th International Symposium on High-Assurance Systems Engineering.

[5]  virtualization.info 日本語 Gartner社による2015年版Magic Quadrant for Cloud Infrastructure as a Serviceリリース (20150520-3) , 2015 .

[6]  Archana Ganapathi,et al.  Statistics-driven workload modeling for the Cloud , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[7]  Guanying Wang,et al.  Towards Synthesizing Realistic Workload Traces for Studying the Hadoop Ecosystem , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.

[8]  David G. Andersen,et al.  Energy-efficient cluster computing with FAWN: workloads and implications , 2010, e-Energy.

[9]  Adam Gold,et al.  Understanding the Mann-Whitney test , 2007 .

[10]  Sonali Aggarwal,et al.  Characterization of Hadoop Jobs Using Unsupervised Learning , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[11]  M. Stephens EDF Statistics for Goodness of Fit and Some Comparisons , 1974 .

[12]  Angelo M. Mineo,et al.  A Software Tool for the Exponential Power Distribution: The normalp Package , 2005 .

[13]  A. Jenkinson The frequency distribution of the annual maximum (or minimum) values of meteorological elements , 1955 .

[14]  Nitesh V. Chawla,et al.  A Minimum-Cost Flow Model for Workload Optimization on Cloud Infrastructure , 2017, 2017 IEEE 10th International Conference on Cloud Computing (CLOUD).

[15]  Dror G. Feitelson,et al.  Workload Modeling for Computer Systems Performance Evaluation , 2015 .

[16]  Wentong Cai,et al.  QoS-Aware Revenue-Cost Optimization for Latency-Sensitive Services in IaaS Clouds , 2012, 2012 IEEE/ACM 16th International Symposium on Distributed Simulation and Real Time Applications.

[17]  Peng Huang,et al.  Gray Failure: The Achilles' Heel of Cloud-Scale Systems , 2017, HotOS.

[18]  William H. Asquith,et al.  L-moments and TL-moments of the generalized lambda distribution , 2007, Comput. Stat. Data Anal..

[19]  Rajeev Gandhi,et al.  An Analysis of Traces from a Production MapReduce Cluster , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[20]  Hoang Pham,et al.  On Recent Generalizations of the Weibull Distribution , 2007, IEEE Transactions on Reliability.

[21]  Jie Xu,et al.  Analysis, Modeling and Simulation of Workload Patterns in a Large-Scale Utility Cloud , 2014, IEEE Transactions on Cloud Computing.

[22]  Juha Karvanen,et al.  Characterizing the generalized lambda distribution by L-moments , 2008, Comput. Stat. Data Anal..

[23]  Long Wang,et al.  Towards an Understanding of Oversubscription in Cloud , 2012, Hot-ICE.

[24]  Jie Xu,et al.  An Approach for Characterizing Workloads in Google Cloud to Derive Realistic Resource Utilization Models , 2013, 2013 IEEE Seventh International Symposium on Service-Oriented System Engineering.

[25]  D. Brown,et al.  Models in Biology: Mathematics, Statistics and Computing. , 1995 .

[26]  Ian Sommerville,et al.  Workload Classification & Software Energy Measurement for Efficient Scheduling on Private Cloud Platforms , 2011, ArXiv.

[27]  J. R. Wallis,et al.  Probability Weighted Moments: Definition and Relation to Parameters of Several Distributions Expressable in Inverse Form , 1979 .

[28]  S. Pal,et al.  Multipeak histogram analysis in region splitting: a regularisation problem , 1991 .

[29]  Russell C. H. Cheng,et al.  Estimating Parameters in Continuous Univariate Distributions with a Shifted Origin , 1983 .

[30]  F. Mosteller,et al.  Low Moments for Small Samples: A Comparative Study of Order Statistics , 1947 .

[31]  Yohan Chalabi,et al.  New directions in statistical distributions, parametric modeling and portfolio selection , 2012 .

[32]  Steve Su,et al.  Fitting Single and Mixture of Generalized Lambda Distributions to Data via Discretized and Maximum Likelihood Methods: GLDEX in R , 2007 .