Design of Disaster-Resilient Optical Datacenter Networks

Survivability against disasters-both natural and deliberate attacks, and spanning large geographical areas-is becoming a major challenge in communication networks. Cloud services delivered by datacenter networks yield new opportunities to provide protection against disasters. Cloud services require a network substrate with high capacity, low latency, high availability, and low cost, which can be delivered by optical networks. In such networks, path protection against network failures is generally ensured by providing a backup path to the same destination (i.e., a datacenter), which is link-disjoint to the primary path. This protection fails to protect against disasters covering an area which disrupts both primary and backup paths. Also, protection against destination (datacenter) node failure is not ensured by a generic protection scheme. Moreover, content/service protection is a fundamental problem in a datacenter network, as the failure of a datacenter should not cause the disappearance of a specific content/service from the network. So content placement, routing, and protection of paths and content should be addressed together. In this work, we propose an integrated Integer Linear Program (ILP) to design an optical datacenter network, which solves the above-mentioned problems simultaneously. We show that our disaster protection scheme exploiting anycasting provides more protection, but uses less capacity than dedicated single-link failure protection. We show that a reasonable number of datacenters and selective content replicas with intelligent network design can provide survivability to disasters while supporting user demands. We also propose ILP relaxations and heuristics to solve the problem for large networks.

[1]  Albert G. Greenberg,et al.  Measuring and Evaluating TCP Splitting for Cloud Services , 2010, PAM.

[2]  Srinivasan Ramasubramanian,et al.  Dual-link failure resiliency through backup link mutual exclusion , 2005, 2nd International Conference on Broadband Networks, 2005..

[3]  Chris Develder,et al.  Exploiting relocation to reduce network dimensions of resilient optical grids , 2009, 2009 7th International Workshop on Design of Reliable Communication Networks.

[4]  Didier Colle,et al.  Optical Networks for Grid and Cloud Computing Applications , 2012, Proceedings of the IEEE.

[5]  Chris Develder,et al.  Survivable Optical Grid Dimensioning: Anycast Routing with Server and Network Failure Protection , 2011, 2011 IEEE International Conference on Communications (ICC).

[6]  Biswanath Mukherjee,et al.  A disaster-resilient multi-content optical datacenter network architecture , 2011, 2011 13th International Conference on Transparent Optical Networks.

[7]  Chris Rose,et al.  A Break in the Clouds: Towards a Cloud Definition , 2011 .

[8]  N. Ghani,et al.  Multi-domain DWDM network provisioning for correlated failures , 2011, 2011 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference.

[9]  Chunming Qiao,et al.  Scheduling Algorithm for Workflow-Based Applications in Optical Grid , 2008, Journal of Lightwave Technology.

[10]  Biswanath Mukherjee,et al.  Path-protection routing and wavelength assignment (RWA) in WDM mesh networks under duct-layer constraints , 2003, TNET.

[11]  Hong Liu,et al.  Fiber optic communication technologies: What's needed for datacenter network operations , 2010, IEEE Communications Magazine.

[12]  Biswanath Mukherjee,et al.  Optical WDM Networks , 2006 .

[13]  Biswanath Mukherjee,et al.  On dimensioning optical grids and the impact of scheduling , 2008, Photonic Network Communications.

[14]  B. Mukherjee,et al.  New and improved approaches for shared-path protection in WDM mesh networks , 2004, Journal of Lightwave Technology.

[15]  Eytan Modiano,et al.  A robust optimization approach to backup network design with random failures , 2015, 2011 Proceedings IEEE INFOCOM.

[16]  Arunabha Sen,et al.  Region-based connectivity - a new paradigm for design of fault-tolerant networks , 2009, 2009 International Conference on High Performance Switching and Routing.

[17]  Krishna M. Sivalingam,et al.  A hybrid protection-restoration mechanism for enhancing dual-failure restorability in optical mesh-restorable networks , 2003, OptiComm: Optical Networking and Communications Conference.

[18]  L. G. H. Cijan A polynomial algorithm in linear programming , 1979 .

[19]  Nasir Ghani,et al.  Modeling Stochastic Correlated Failures and their Effects on Network Reliability , 2011, 2011 Proceedings of 20th International Conference on Computer Communications and Networks (ICCCN).

[20]  Biswanath Mukherjee,et al.  Traffic grooming in an optical WDM mesh network , 2002, IEEE J. Sel. Areas Commun..

[21]  Biswanath Mukherjee,et al.  Survivable Optical WDM Networks , 2005 .

[22]  Marta M. B. Pascoal,et al.  A new implementation of Yen’s ranking loopless paths algorithm , 2003, 4OR.

[23]  Biswanath Mukherjee,et al.  Survivable WDM mesh networks. Part I-Protection , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[24]  Byrav Ramamurthy,et al.  Shared risk link Group (SRLG)-diverse path provisioning under hybrid service level agreements in wavelength-routed optical mesh networks , 2005, IEEE/ACM Transactions on Networking.

[25]  Biswanath Mukherjee,et al.  Holding-Time-Aware Dynamic Traffic Grooming , 2008, IEEE Journal on Selected Areas in Communications.

[26]  Eytan Modiano,et al.  Assessing the Vulnerability of the Fiber Infrastructure to Disasters , 2009, IEEE INFOCOM 2009.

[27]  Arunabha Sen,et al.  Design and Analysis of Networks with Large Components in Presence of Region-Based Faults , 2011, 2011 IEEE International Conference on Communications (ICC).

[28]  Biswanath Mukherjee,et al.  Connection management for survivable wavelength-routed wdm mesh networks , 2001 .

[29]  L. Khachiyan Polynomial algorithms in linear programming , 1980 .

[30]  A. Gumaste,et al.  Multi-failure post-fault restoration in multidomain DWDM networks , 2011, 2011 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference.

[31]  Eytan Modiano,et al.  Reliability in Layered Networks with Random Link Failures , 2010, 2010 Proceedings IEEE INFOCOM.

[32]  Gil Zussman,et al.  The resilience of WDM networks to probabilistic geographical failures , 2011, INFOCOM 2011.

[33]  Narendra Karmarkar,et al.  A new polynomial-time algorithm for linear programming , 1984, STOC '84.

[34]  Zhi-Li Zhang,et al.  A first look at inter-data center traffic characteristics via Yahoo! datasets , 2011, 2011 Proceedings IEEE INFOCOM.

[35]  Albert G. Greenberg,et al.  The cost of a cloud: research problems in data center networks , 2008, CCRV.