Optimizing the Cloud Data Center Availability Empowered by Surrogate Models

Making data centers highly available remains a challenge that must be considered since the design phase. The problem is selecting the right strategies and components for achieving this goal given a limited investment. Furthermore, data center designers currently lack reliable specialized tools to accomplish this task. In this paper, we disclose a formal method that chooses the components and strategies that optimize the availability of a data center while considering a given budget as a constraint. For that, we make use of stochastic models to represent a cloud data center infrastructure based on the TIA-942 standard. In order to improve the computational cost incurred to solve this optimization problem, we employ surrogate models to handle the complexity of the stochastic models. In this work, we use a Gaussian process to produce a surrogate model for a cloud data center infrastructure and we use three derivative-free optimization algorithms to explore the search space and to find optimal solutions. From the results, we observe that the Differential Evolution (DE) algorithm outperforms the other tested algorithms, since it achieves higher availability with a fair usage of the

[1]  Lixin Tang,et al.  An Improved Differential Evolution Algorithm for Practical Dynamic Scheduling in Steelmaking-Continuous Casting Production , 2014, IEEE Transactions on Evolutionary Computation.

[2]  Kishor S. Trivedi,et al.  Power-hierarchy of dependability-model types , 1994 .

[3]  L. Darrell Whitley,et al.  An Executable Model of a Simple Genetic Algorithm , 1992, FOGA.

[4]  Xu Xu,et al.  Surrogate Models for Mixed Discrete-Continuous Variables , 2014, Constraint Programming and Decision Making.

[5]  Anthony J. Jakeman,et al.  A review of surrogate models and their application to groundwater modeling , 2015 .

[6]  Nitesh V. Chawla,et al.  A Minimum-Cost Flow Model for Workload Optimization on Cloud Infrastructure , 2017, 2017 IEEE 10th International Conference on Cloud Computing (CLOUD).

[7]  Kevin Lano,et al.  Surrogate-Assisted Online Optimisation of Cloud IaaS Configurations , 2014, 2014 IEEE 6th International Conference on Cloud Computing Technology and Science.

[8]  Andy J. Keane,et al.  Combining Global and Local Surrogate Models to Accelerate Evolutionary Optimization , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[9]  Judith Kelner,et al.  Evaluating the cooling subsystem availability on a Cloud data center , 2017, 2017 IEEE Symposium on Computers and Communications (ISCC).

[10]  Aleš Florian,et al.  An efficient sampling scheme: Updated Latin Hypercube Sampling , 1992 .

[11]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[12]  Judith Kelner,et al.  How to Improve Cloud Services Availability? Investigating the Impact of Power and It Subsystems Failures , 2018, HICSS.

[13]  Jon Atli Benediktsson,et al.  Feature Selection Based on Hybridization of Genetic Algorithm and Particle Swarm Optimization , 2015, IEEE Geoscience and Remote Sensing Letters.

[14]  Nima Jafari Navimipour,et al.  Service allocation in the cloud environments using multi-objective particle swarm optimization algorithm based on crowding distance , 2017, Swarm Evol. Comput..

[15]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[16]  Keqin Li,et al.  Fine-Grained Energy Consumption Model of Servers Based on Task Characteristics in Cloud Data Center , 2018, IEEE Access.

[17]  Judith Kelner,et al.  Modeling and analyzing power system failures on cloud services , 2017, 2017 13th International Conference on Network and Service Management (CNSM).

[18]  Chris Lacor,et al.  Robust parameter design optimization using Kriging, RBF and RBFNN with gradient-based and evolutionary optimization techniques , 2014, Appl. Math. Comput..

[19]  Judith Kelner,et al.  Analyzing the IT subsystem failure impact on availability of cloud services , 2017, 2017 IEEE Symposium on Computers and Communications (ISCC).

[20]  Glauco Estácio Gonçalves,et al.  DCAV: A software system to evaluate next‐generation cloud data center availability through a friendly graphical interface , 2019, Softw. Pract. Exp..

[21]  Glauco Estácio Gonçalves,et al.  Highly Available Clouds: System Modeling, Evaluations, and Open Challenges , 2017, Research Advances in Cloud Computing.

[22]  Gustavo Rau de Almeida Callou,et al.  Estimating sustainability impact of high dependable data centers: a comparative study between Brazilian and US energy mixes , 2013, Computing.

[23]  Vincent Roberge,et al.  Comparison of Parallel Genetic Algorithm and Particle Swarm Optimization for Real-Time UAV Path Planning , 2013, IEEE Transactions on Industrial Informatics.

[24]  M. E. H. Pedersen,et al.  Good Parameters for Differential Evolution , 2010 .

[25]  R. Webster,et al.  Kriging: a method of interpolation for geographical information systems , 1990, Int. J. Geogr. Inf. Sci..

[26]  A. Gandomi,et al.  A novel improved accelerated particle swarm optimization algorithm for global numerical optimization , 2014 .

[27]  Bianca Schroeder,et al.  Disk Failures in the Real World: What Does an MTTF of 1, 000, 000 Hours Mean to You? , 2007, FAST.

[28]  Joe Wiart,et al.  A new surrogate modeling technique combining Kriging and polynomial chaos expansions - Application to uncertainty analysis in computational dosimetry , 2015, J. Comput. Phys..

[29]  Ahmad Khademzadeh,et al.  A survey of fault tolerance architecture in cloud computing , 2016, J. Netw. Comput. Appl..

[30]  P. Suganthan,et al.  Problem Definitions and Evaluation Criteria for the CEC 2010 Competition on Constrained Real- Parameter Optimization , 2010 .

[31]  M. E. H. Pedersen Good Parameters for Particle Swarm Optimization , 2010 .

[32]  D. Karaboga,et al.  On the performance of artificial bee colony (ABC) algorithm , 2008, Appl. Soft Comput..

[33]  Ponnuthurai N. Suganthan,et al.  Recent advances in differential evolution - An updated survey , 2016, Swarm Evol. Comput..

[34]  Judith Kelner,et al.  Minimizing and Managing Cloud Failures , 2017, Computer.

[35]  Rivalino Matias,et al.  Design of it Infrastructures of Data Centers: An Approach Based on Business and Technical Metrics , 2015 .

[36]  Xin-She Yang,et al.  Nature-Inspired Optimization Algorithms: Challenges and Open Problems , 2020, J. Comput. Sci..

[37]  R. Rao Jaya: A simple and new optimization algorithm for solving constrained and unconstrained optimization problems , 2016 .

[38]  Gustavo Rau de Almeida Callou,et al.  Availability modeling and analysis of a disaster-recovery-as-a-service solution , 2017, Computing.

[39]  Uday Kumar,et al.  System Maintenance: Trends in Management and Technology , 2008 .