Reliable Virtual Machine Placement and Routing in Clouds

In current cloud computing systems, when leveraging virtualization technology, the customer’s requested data computing or storing service is accommodated by a set of communicated virtual machines (VM) in a scalable and elastic manner. These VMs are placed in one or more server nodes according to the node capacities or failure probabilities. The VM placement availability refers to the probability that at least one set of all customer’s requested VMs operates during the requested lifetime. In this paper, we first study the problem of placing at most <inline-formula> <tex-math notation="LaTeX">$H$</tex-math><alternatives><inline-graphic xlink:href="yang-ieq1-2693273.gif"/> </alternatives></inline-formula> groups of <inline-formula><tex-math notation="LaTeX">$k$</tex-math><alternatives> <inline-graphic xlink:href="yang-ieq2-2693273.gif"/></alternatives></inline-formula> requested VMs on a minimum number of nodes, such that the VM placement availability is no less than <inline-formula><tex-math notation="LaTeX">$\delta$ </tex-math><alternatives><inline-graphic xlink:href="yang-ieq3-2693273.gif"/></alternatives></inline-formula>, and that the specified communication delay and connection availability for each VM pair under the same placement group are not violated. We consider this problem with and without Shared-Risk Node Group (SRNG) failures, and prove this problem is NP-hard in both cases. We subsequently propose an exact Integer Nonlinear Program (INLP) and an efficient heuristic to solve this problem. We conduct simulations to compare the proposed algorithms with two existing heuristics in terms of performance. Finally, we study the related reliable routing problem of establishing a connection over at most <inline-formula><tex-math notation="LaTeX">$w$</tex-math><alternatives> <inline-graphic xlink:href="yang-ieq4-2693273.gif"/></alternatives></inline-formula> link-disjoint paths from a source to a destination, such that the connection availability requirement is satisfied and each path delay is no more than a given value. We devise an exact algorithm and two heuristics to solve this NP-hard problem, and evaluate them via simulations.

[1]  Xin Li,et al.  Traffic and failure aware VM placement for multi-tenant cloud computing , 2015, 2015 IEEE 23rd International Symposium on Quality of Service (IWQoS).

[2]  Ramin Yahyapour,et al.  Reliable Virtual Machine placement in distributed clouds , 2016, 2016 8th International Workshop on Resilient Networks Design and Modeling (RNDM).

[3]  Danny Raz,et al.  Cost aware fault recovery in clouds , 2013, 2013 IFIP/IEEE International Symposium on Integrated Network Management (IM 2013).

[4]  Rolf Stadler,et al.  Resource Management in Clouds: Survey and Research Challenges , 2015, Journal of Network and Systems Management.

[5]  Stojan Trajanovski,et al.  Availability-based path selection and network vulnerability assessment , 2015, Networks.

[6]  P. Mell,et al.  The NIST Definition of Cloud Computing , 2011 .

[7]  Kashi Venkatesh Vishwanath,et al.  Characterizing cloud computing hardware reliability , 2010, SoCC '10.

[8]  Stojan Trajanovski,et al.  Availability-based path selection , 2014, 2014 6th International Workshop on Reliable Networks Design and Modeling (RNDM).

[9]  Thomas H. Cormen,et al.  Introduction to algorithms [2nd ed.] , 2001 .

[10]  Ioannis Tomkos,et al.  Optical Interconnects for Future Data Center Networks , 2012 .

[11]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[12]  T. V. Lakshman,et al.  Network aware resource allocation in distributed clouds , 2012, 2012 Proceedings IEEE INFOCOM.

[13]  Vasileios Pappas,et al.  Improving the Scalability of Data Center Networks with Traffic-aware Virtual Machine Placement , 2010, 2010 Proceedings IEEE INFOCOM.

[14]  Jason P. Jue,et al.  How Reliable Can Two-Path Protection Be? , 2010, IEEE/ACM Transactions on Networking.

[15]  Yi Zhu,et al.  Reliable resource allocation for optically interconnected distributed clouds , 2014, 2014 IEEE International Conference on Communications (ICC).

[16]  Antonio Corradi,et al.  A Stable Network-Aware VM Placement for Cloud Systems , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[17]  P. V. Mieghem,et al.  PATHS IN THE SIMPLE RANDOM GRAPH AND THE WAXMAN GRAPH , 2001, Probability in the Engineering and Informational Sciences.

[18]  T. V. Lakshman,et al.  Optimizing data access latencies in cloud systems by intelligent virtual machine placement , 2013, 2013 Proceedings IEEE INFOCOM.

[19]  Navendu Jain,et al.  Understanding network failures in data centers: measurement, analysis, and implications , 2011, SIGCOMM.

[20]  Biswanath Mukherjee,et al.  Optical WDM Networks , 2006 .

[21]  Eytan Modiano,et al.  Diverse Routing in Networks with Probabilistic Failures , 2009, IEEE INFOCOM 2009.

[22]  Zoltán Ádám Mann,et al.  Allocation of Virtual Machines in Cloud Data Centers—A Survey of Problem Models and Optimization Algorithms , 2015, ACM Comput. Surv..

[23]  Elliot K. Kolodner,et al.  Guaranteeing High Availability Goals for Virtual Machine Placement , 2011, 2011 31st International Conference on Distributed Computing Systems.

[24]  Fernando A. Kuipers,et al.  On the availability of networks , 2007 .

[25]  Biswanath Mukherjee,et al.  Dynamic provisioning with availability guarantee for differentiated services in survivable mesh networks , 2007, IEEE Journal on Selected Areas in Communications.

[26]  Chung-Lun Li,et al.  The complexity of finding two disjoint paths with min-max objective function , 1989, Discret. Appl. Math..

[27]  Piet Van Mieghem,et al.  Concepts of exact QoS routing algorithms , 2004, IEEE/ACM Transactions on Networking.

[28]  Minghua Chen,et al.  Joint VM placement and routing for data center traffic engineering , 2012, 2012 Proceedings IEEE INFOCOM.

[29]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[30]  Arun K. Somani,et al.  Graph transformation approaches for diverse routing in shared risk resource group (SRRG) failures , 2008, Comput. Networks.

[31]  Lemin Li,et al.  Routing Connections With Differentiated Reliability Requirements in WDM Mesh Networks , 2009, IEEE/ACM Transactions on Networking.

[32]  Chunming Qiao,et al.  Availability-aware energy-efficient virtual machine placement , 2015, 2015 IEEE International Conference on Communications (ICC).

[33]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[34]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .