Evaluation and design of highly reliable and highly utilized cloud computing systems

Cloud computing paradigm has ushered in the need to provide resources to users in a scalable, flexible, and transparent fashion much like any other utility. This has led to a need for developing evaluation techniques that can provide quantitative measures of reliability of a cloud computing system (CCS) for efficient planning and expansion. This paper presents a new, scalable algorithm based on non-sequential Monte Carlo Simulation (MCS) to evaluate large scale cloud computing system (CCS) reliability, and it develops appropriate performance measures. Also, a new iterative algorithm is proposed and developed that leverages the MCS method for the design of highly reliable and highly utilized CCSs. The combination of these two algorithms allows CCSs to be evaluated by providers and users alike, providing a new method for estimating the parameters of service level agreements (SLAs) and designing CCSs to match those contractual requirements posed in SLAs. Results demonstrate that the proposed methods are effective and applicable to systems at a large scale. Multiple insights are also provided into the nature of CCS reliability and CCS design.

[1]  Yi-Kuei Lin,et al.  Performance indicator evaluation for a cloud computing system from QoS viewpoint , 2013 .

[2]  Enrico Zio,et al.  Monte Carlo Simulation: The Method , 2013 .

[3]  Yi-Kuei Lin,et al.  Evaluation of system reliability for a cloud computing system with imperfect nodes , 2012, Syst. Eng..

[4]  Yi-Kuei Lin,et al.  Maintenance reliability estimation for a cloud computing network with nodes failure , 2011, Expert Syst. Appl..

[5]  Jin B. Hong,et al.  Availability Modeling and Analysis of a Virtualized System , 2009, 2009 15th IEEE Pacific Rim International Symposium on Dependable Computing.

[6]  Navendu Jain,et al.  Understanding network failures in data centers: measurement, analysis, and implications , 2011, SIGCOMM.

[7]  Yi-Kuei Lin,et al.  Approximate and accurate maintenance reliabilities of a cloud computing network with nodes failure subject to budget , 2012 .

[8]  Louise E. Moser,et al.  Fault Tolerance Middleware for Cloud Computing , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[9]  Daniele Puccinelli,et al.  Characterization of the impact of resource availability on opportunistic computing , 2012, MCC '12.

[10]  Dimitrios Zissis,et al.  Addressing cloud computing security issues , 2012, Future Gener. Comput. Syst..

[11]  Jothy Rosenberg,et al.  The Cloud at Your Service: The When, How, and Why of Enterprise Cloud Computing , 2010 .

[12]  Kishor S. Trivedi,et al.  Scalable Analytics for IaaS Cloud Availability , 2014, IEEE Transactions on Cloud Computing.

[13]  Alan Scheller-Wolf,et al.  Redundancy Optimization for Critical Components in High-Availability Technical Systems , 2013, Oper. Res..

[14]  Tao Zhang,et al.  A Markov Chain-based Availability Model of Virtual Cluster Nodes , 2011, 2011 Seventh International Conference on Computational Intelligence and Security.

[15]  Tadashi Dohi,et al.  Component Importance Analysis of Virtualized System , 2012, 2012 9th International Conference on Ubiquitous Intelligence and Computing and 9th International Conference on Autonomic and Trusted Computing.

[16]  Yi-Kuei Lin,et al.  Estimation of Maintenance Reliability for a Cloud Computing Network , 2010 .

[17]  Kashi Venkatesh Vishwanath,et al.  Characterizing cloud computing hardware reliability , 2010, SoCC '10.

[18]  Evgenia Smirni,et al.  Data Centers in the Cloud: A Large Scale Performance Study , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[19]  Brian Hayes,et al.  What Is Cloud Computing? , 2019, Cloud Technologies.

[20]  Sanjay P. Ahuja,et al.  Availability of Services in the Era of Cloud Computing , 2012, Netw. Commun. Technol..

[21]  Eric Bauer,et al.  Reliability and Availability of Cloud Computing: Bauer/Cloud Computing , 2012 .

[22]  Karthik Pattabiraman,et al.  Intermittent Hardware Errors Recovery: Modeling and Evaluation , 2012, 2012 Ninth International Conference on Quantitative Evaluation of Systems.

[23]  Dong Seong Kim,et al.  Interacting Markov Chain based Hierarchical Approach for Cloud Services , 2010 .

[24]  Enrico Zio,et al.  The Monte Carlo Simulation Method for System Reliability and Risk Analysis , 2012 .

[25]  John R. Douceur,et al.  Cycles, cells and platters: an empirical analysisof hardware failures on a million consumer PCs , 2011, EuroSys '11.

[26]  Gin-Shuh Liang,et al.  A fuzzy AHP approach based on the concept of possibility extent , 2013 .

[27]  Deep Medhi,et al.  A hierarchical model to evaluate quality of experience of online services hosted by cloud computing , 2011, 12th IFIP/IEEE International Symposium on Integrated Network Management (IM 2011) and Workshops.

[28]  Young-Sik Jeong,et al.  High availability and efficient energy consumption for cloud computing service with grid infrastructure , 2013, Comput. Electr. Eng..

[29]  H. Howie Huang,et al.  Providing reliability as an elastic service in cloud computing , 2012, 2012 IEEE International Conference on Communications (ICC).

[30]  Michael D. Williams,et al.  Availability Management in a Virtualized World , 2009, SVM.

[31]  Jaswinder Singh,et al.  Failures in Cloud Computing Data Centers in 3-tier Cloud Architecture , 2012 .

[32]  Surajit Chaudhuri,et al.  Proceedings of the 11th ACM Symposium on Cloud Computing , 2010 .

[33]  Ravishankar K. Iyer,et al.  Toward a high availability cloud: Techniques and challenges , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN 2012).

[34]  V. Piuri,et al.  A comprehensive conceptual system-level approach to fault tolerance in Cloud Computing , 2012, 2012 IEEE International Systems Conference SysCon 2012.

[35]  Eric Bauer,et al.  Reliability and Availability of Cloud Computing , 2012 .

[36]  Xi Chen,et al.  An Availability-Aware Approach to Resource Placement of Dynamic Scaling in Clouds , 2012, 2012 IEEE Fifth International Conference on Cloud Computing.

[37]  Ravi Jhawar,et al.  Chapter 7 – Fault Tolerance and Resilience in Cloud Computing Environments , 2013 .

[38]  Vivek Kundra,et al.  Federal Cloud Computing Strategy , 2011 .