Recovery from Link Failures in Networks with Arbitrary Topology via Diversity Coding

Link failures in wide area networks are common. To recover from such failures, a number of methods such as SONET rings, protection cycles, and source rerouting have been investigated. Two important considerations in such approaches are the recovery time and the needed spare capacity to complete the recovery. Usually, these techniques attempt to achieve a recovery time less than 50 ms. In this paper we introduce an approach that provides link failure recovery in a hitless manner, or without any appreciable delay. This is achieved by means of a method called diversity coding. We present an algorithm for the design of an overlay network to achieve recovery from single link failures in arbitrary networks via diversity coding. This algorithm is designed to minimize spare capacity for recovery. We compare the recovery time and spare capacity performance of this algorithm against conventional techniques in terms of recovery time, spare capacity, and a joint metric called Quality of Recovery (QoR). QoR incorporates both the spare capacity percentages and worst case recovery times. Based on these results, we conclude that the proposed technique provides much shorter recovery times while achieving similar extra capacity, or better QoR performance overall.

[1]  Richard D. Gitlin,et al.  Diversity coding: using error control for self-healing in communication networks , 1990, Proceedings. IEEE INFOCOM '90: Ninth Annual Joint Conference of the IEEE Computer and Communications Societies@m_The Multiple Facets of Integration.

[2]  Stamatios V. Kartalopoulos,et al.  Understanding SONET/SDH and ATM , 1999 .

[3]  Xiaodan Hu,et al.  Hitless recovery from link failures in networks with arbitrary topology , 2011, 2011 Information Theory and Applications Workshop.

[4]  A.P. Snow,et al.  Collateral damage from anticipated or real disasters: skewed perceptions of system and business continuity risk? , 2005, Proceedings. 2005 IEEE International Engineering Management Conference, 2005..

[5]  Stamatios V. Kartalopoulos Understanding Sonet/Sdh and Atm: Communications Networks for the Next Millennium , 1999 .

[6]  Biswanath Mukherjee,et al.  Fault management in IP-over-WDM networks: WDM protection versus IP restoration , 2002, IEEE J. Sel. Areas Commun..

[7]  Lorne Mason,et al.  Restoration strategies and spare capacity requirements in self-healing ATM networks , 1999, TNET.

[8]  T. Shallice What ghost in the machine? , 1992, Nature.

[9]  Ahmed E. Kamal,et al.  Efficient and Agile 1+N Protection , 2011, IEEE Transactions on Communications.

[10]  H. Eslambolchi,et al.  FASTAR-a robust system for fast DS3 restoration , 1991, IEEE Global Telecommunications Conference GLOBECOM '91: Countdown to the New Millennium. Conference Record.

[11]  Piotr Cholda,et al.  Network Recovery, Protection and Restoration of Optical, SONET-SDH, IP, and MPLS [Book Review] , 2005, IEEE Communications Magazine.

[12]  John C. McDonald,et al.  Public network integrity-avoiding a crisis in trust , 1994, IEEE J. Sel. Areas Commun..

[13]  Andrzej Jajszczyk,et al.  A unified quality of recovery (QoR) measure , 2008, Int. J. Commun. Syst..

[14]  Lorne Mason,et al.  Restoration strategies and spare capacity requirements in self-healing ATM networks , 1997, Proceedings of INFOCOM '97.

[15]  Richard D. Gitlin,et al.  Diversity coding for transparent self-healing and fault-tolerant communication networks , 1993, IEEE Trans. Commun..

[16]  Ahmed E. Kamal,et al.  Overlay protection against link failures using network coding , 2008, 2008 42nd Annual Conference on Information Sciences and Systems.

[17]  Rudolf Ahlswede,et al.  Network information flow , 2000, IEEE Trans. Inf. Theory.

[18]  E Hart Ghost in the machine. , 1991, The Health service journal.

[19]  Biswanath Mukherjee,et al.  Survivable WDM mesh networks , 2003 .

[20]  Albert G. Greenberg,et al.  Fast accurate computation of large-scale IP traffic matrices from link loads , 2003, SIGMETRICS '03.