Server placement with shared backups for disaster-resilient clouds

A key strategy to build disaster-resilient clouds is to employ backups of virtual machines in a geo-distributed infrastructure. Today, the continuous and acknowledged replication of virtual machines in different servers is a service provided by different hypervisors. This strategy guarantees that the virtual machines will have no loss of disk and memory content if a disaster occurs, at a cost of strict bandwidth and latency requirements. Considering this kind of service, in this work, we propose an optimization problem to place servers in a wide area network. The goal is to guarantee that backup machines do not fail at the same time as their primary counterparts. In addition, by using virtualization, we also aim to reduce the amount of backup servers required. The optimal results, achieved in real topologies, reduce the number of backup servers by at least 40%. Moreover, this work highlights several characteristics of the backup service according to the employed network, such as the fulfillment of latency requirements.

[1]  Raouf Boutaba,et al.  Survivable Virtual Network Embedding , 2010, 2021 IFIP/IEEE International Symposium on Integrated Network Management (IM).

[2]  Chris Develder,et al.  Survivable Optical Grid Dimensioning: Anycast Routing with Server and Network Failure Protection , 2011, 2011 IEEE International Conference on Communications (ICC).

[3]  Biswanath Mukherjee,et al.  Survivable WDM mesh networks , 2003 .

[4]  Otto Carlos Muniz Bandeira Duarte,et al.  A two-phase multipathing scheme based on genetic algorithm for data center networking , 2014, GLOBECOM.

[5]  Andrea Bianco,et al.  Optimal Resource Allocation for Disaster Recovery , 2010, 2010 IEEE Global Telecommunications Conference GLOBECOM 2010.

[6]  Andrew Warfield,et al.  SecondSite: disaster tolerance as a service , 2012, VEE '12.

[7]  M. Tornatore,et al.  Design of Disaster-Resilient Optical Datacenter Networks , 2012, Journal of Lightwave Technology.

[8]  Pin-Han Ho,et al.  Data center network placement and service protection in all-optical mesh networks , 2013, 2013 9th International Conference on the Design of Reliable Communication Networks (DRCN).

[9]  Luciano Paschoal Gaspary,et al.  Survivor: An enhanced controller placement strategy for improving SDN survivability , 2014, 2014 IEEE Global Communications Conference.

[10]  Dutch T. Meyer,et al.  Remus: High Availability via Asynchronous Virtual Machine Replication. (Best Paper) , 2008, NSDI.

[11]  Arun Venkataramani,et al.  Disaster Recovery as a Cloud Service: Economic Benefits & Deployment Challenges , 2010, HotCloud.

[12]  Lei Zhang,et al.  Joint Design on DCN Placement and Survivable Cloud Service Provision over All-Optical Mesh Networks , 2014, IEEE Transactions on Communications.

[13]  Miguel Elias M. Campista,et al.  A reliability analysis of datacenter topologies , 2012, 2012 IEEE Global Communications Conference (GLOBECOM).

[14]  Stefano Secci,et al.  Network design requirements for disaster resilience in IaaS clouds , 2014, IEEE Communications Magazine.

[15]  Stefano Secci,et al.  Latency versus survivability in geo-distributed data center design , 2014, 2014 IEEE Global Communications Conference.

[16]  Stefano Secci,et al.  Cloud Networks: Enhancing Performance and Resiliency , 2014, Computer.

[17]  Prashant J. Shenoy,et al.  PipeCloud: using causality to overcome speed-of-light delays in cloud-based disaster recovery , 2011, SOCC '11.

[18]  Chris Develder,et al.  Resilient network dimensioning for optical grid/clouds using relocation , 2012, 2012 IEEE International Conference on Communications (ICC).

[19]  Biswanath Mukherjee,et al.  Disaster survivability in optical communication networks , 2013, Comput. Commun..

[20]  David A. Maltz,et al.  Surviving failures in bandwidth-constrained datacenters , 2012, CCRV.

[21]  Guillaume Pierre,et al.  Globule: a collaborative content delivery network , 2006, IEEE Communications Magazine.

[22]  Jingjing Yao,et al.  Minimizing disaster backup window for geo-distributed multi-datacenter cloud systems , 2014, 2014 IEEE International Conference on Communications (ICC).

[23]  Chunming Qiao,et al.  Cost Efficient Design of Survivable Virtual Infrastructure to Recover from Facility Node Failures , 2011, 2011 IEEE International Conference on Communications (ICC).