Network Reliability and Fault Tolerance

This article introduces fundamental concepts in fault tolerance and reliability as applied to networks, with a focus on high-speed backbone networks. We first introduce basic terminology and elements necessary to fault tolerance, then describe the basic strategies for detecting and handling failures in networks. The discussion then builds on these basics to illustrate recovery schemes for high-speed backbone networks. An examination of the relationship between network topologies and recovery strategies follows, beginning with rings and then covering more general mesh networks. We cover some basic techniques used in packet-switched networks, and conclude with a discussion of local-area network approaches to fault tolerance. Keywords: networking; optical networks; circuit-switched; packet-switched; SONET; ATM; reliability; fault tolerance

[1]  Bruce K. Bell,et al.  Volume 5 , 1998 .

[2]  I. Fournier Longest Cycles in 2-Connected Graphs of Independence Number α , 1985 .

[3]  T.-H. Wu,et al.  Feasibility study of a high-speed SONET self-healing ring architecture in future interoffice fiber networks , 1990, IEEE Conference on Military Communications.

[4]  Salman Z. Shaikh Span-disjoint paths for physical diversity in networks , 1995, Proceedings IEEE Symposium on Computers and Communications.

[5]  Yoshio Kajiyama,et al.  An ATM VP-based self-healing ring , 1994, IEEE J. Sel. Areas Commun..

[6]  G. Szekeres,et al.  Polyhedral decompositions of cubic graphs , 1973, Bulletin of the Australian Mathematical Society.

[7]  Wayne D. Grover,et al.  Cycle-oriented distributed preconfiguration: ring-like speed with mesh-like capacity for self-planning network restoration , 1998, ICC '98. 1998 IEEE International Conference on Communications. Conference Record. Affiliated with SUPERCOMM'98 (Cat. No.98CH36220).

[8]  Biswanath Mukherjee,et al.  Survivable WDM mesh networks. Part I-Protection , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[9]  K. Menger Zur allgemeinen Kurventheorie , 1927 .

[10]  Robert D. Doverspike,et al.  Comparison of capacity efficiency of DCS network restoration routing techniques , 1994, Journal of Network and Systems Management.

[11]  Narsingh Deo,et al.  On Algorithms for Enumerating All Circuits of a Graph , 1976, SIAM J. Comput..

[12]  J. Veerasamy,et al.  Effect of traffic splitting on link and path restoration planning , 1994, 1994 IEEE GLOBECOM. Communications: The Global Bridge.

[13]  Luis Goddyn A Girth Requirement for the Double Cycle Cover Conjecture , 1985 .

[14]  Radia Perlman,et al.  An algorithm for distributed computation of a spanningtree in an extended LAN , 1985, SIGCOMM '85.

[15]  Joseph Sosnosky,et al.  Service applications for SONET DCS distributed restoration , 1994, IEEE J. Sel. Areas Commun..

[16]  Ioannis G. Tollis,et al.  Techniques for finding ring covers in survivable networks , 1994, 1994 IEEE GLOBECOM. Communications: The Global Bridge.

[17]  R. H. Cardwell,et al.  Survivable network architectures for broad-band fiber optic networks: model and performance comparison , 1988 .

[18]  R. H. Cardwell,et al.  A multi-period design model for survivable network architecture selection for SONET interoffice networks , 1991 .

[19]  J. S. Whalen,et al.  Finding maximal link disjoint paths in a multigraph , 1990, [Proceedings] GLOBECOM '90: IEEE Global Telecommunications Conference and Exhibition.

[20]  S. F. Habiby,et al.  Strategies and technologies for planning a cost-effective survivable fiber network architecture using optical switches , 1989, IEEE International Conference on Communications, World Prosperity Through Communications,.

[21]  O. J. Wasem,et al.  An algorithm for designing rings for survivable fiber networks , 1991 .

[22]  R. O. LaMaire FDDI performance at 1 Gbit/s , 1991, ICC 91 International Conference on Communications Conference Record.

[23]  Koso Murakami,et al.  Network Restoration Algorithm for Multimedia Communication Services and Its Performance Characteristics , 1995 .

[24]  J. E. Baker,et al.  A distributed link restoration algorithm with robust preplanning , 1991, IEEE Global Telecommunications Conference GLOBECOM '91: Countdown to the New Millennium. Conference Record.

[25]  Marco Ajmone Marsan,et al.  SR/sup 3/: a bandwidth-reservation MAC protocol for multimedia applications over all-optical WDM multi-rings , 1997, Proceedings of INFOCOM '97.

[26]  Alexander Gersht,et al.  Dynamic bandwidth-allocation and path-restoration in SONET self-healing networks , 1996, IEEE Trans. Reliab..

[27]  Alon Itai,et al.  The Multi-Tree Approach to Reliability in Distributed Networks , 1988, Inf. Comput..

[28]  Radia J. Perlman,et al.  An algorithm for distributed computation of a spanningtree in an extended LAN , 1985, SIGCOMM '85.

[29]  R. Häggkvist,et al.  A Note on Maximal Cycles in 2-Connected Graphs , 1985 .

[30]  Lorne Mason,et al.  Restoration strategies and spare capacity requirements in self-healing ATM networks , 1999, TNET.

[31]  Tsong-Ho Wu A passive protected self-healing mesh network architecture and applications , 1994, TNET.

[32]  C. H. Yang,et al.  FITNESS-failure immunization technology for network services survivability , 1988, IEEE Global Telecommunications Conference and Exhibition. Communications for the Information Age.

[33]  Carsten Thomassen On the Complexity of Finding a Minimum Cycle Cover of a Graph , 1997, SIAM J. Comput..

[34]  Piet Demeester,et al.  Spare capacity assignment for different restoration strategies in mesh survivable networks , 1997, Proceedings of ICC'97 - International Conference on Communications.

[35]  Anindo Banerjea,et al.  Recovering guaranteed performance service connections from single and multiple faults , 1994, 1994 IEEE GLOBECOM. Communications: The Global Bridge.

[36]  Henry L. Owen,et al.  An algorithm for bandwidth management with survivability constraints in ATM networks , 1997, Proceedings of ICC'97 - International Conference on Communications.

[37]  Tsong-Ho Wu,et al.  Fiber Network Service Survivability , 1992 .

[38]  Jae-In Kim ATM Virtual Path Self-Healing Based on a New Path Restoration Protocol , 1994 .

[39]  Muriel Médard,et al.  WDM loop-back recovery in mesh networks , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[40]  H. Kopetz,et al.  Dependability: Basic Concepts and Terminology , 1992, Dependable Computing and Fault-Tolerant Systems.

[41]  Thomas Frisanco,et al.  Optimal spare capacity design for various protection switching methods in ATM networks , 1997, Proceedings of ICC'97 - International Conference on Communications.

[42]  I. Campbell,et al.  Volume 30 , 2002 .

[43]  R. Doverspike A multi-layered model for survivability in intra-LATA transport networks , 1991, IEEE Global Telecommunications Conference GLOBECOM '91: Countdown to the New Millennium. Conference Record.

[44]  Richard Graham,et al.  Fiber Distributed Data Interface Overview , 1991, Digit. Tech. J..

[45]  E. L. Hahne,et al.  Fault-tolerant multimesh networks , 1992, [Conference Record] GLOBECOM '92 - Communications for Global Users: IEEE.

[46]  Wayne D. Grover,et al.  An algorithm for survivable network design employing multiple self-healing rings , 1993, Proceedings of GLOBECOM '93. IEEE Global Telecommunications Conference.

[47]  R. H. Cardwell,et al.  High-speed self-healing ring architectures for future interoffice networks , 1989, IEEE Global Telecommunications Conference, 1989, and Exhibition. 'Communications Technology for the 1990s and Beyond.

[48]  Hermann Kopetz,et al.  Dependability: Basic Concepts and Terminology , 1992 .

[49]  Pierre A. Humblet,et al.  An efficient communication protocol for high-speed packet-switched multichannel networks , 1992, SIGCOMM '92.

[50]  Thomas E. Stern,et al.  Multiwavelength Optical Networks: A Layered Approach , 1999 .

[51]  Y. Yamabayashi,et al.  Self-healing algorithm for logical mesh connection on ring networks , 1994 .

[52]  R.S.K. Chng,et al.  A multi-layer restoration strategy for reconfigurable networks , 1994, 1994 IEEE GLOBECOM. Communications: The Global Bridge.

[53]  Noriaki Yoshikai,et al.  Double search self-healing algorithm and its characteristics , 1994 .

[54]  A Louri,et al.  Hierarchical optical ring interconnection (HORN): scalable interconnection network for multiprocessors and multicomputers. , 1997, Applied optics.

[55]  Yakov Rekhter,et al.  A Border Gateway Protocol 4 (BGP-4) , 1994, RFC.

[56]  Chun-Hsien Chen,et al.  A capacity comparison for SONET self-healing ring networks , 1993, Proceedings of GLOBECOM '93. IEEE Global Telecommunications Conference.

[57]  Thomas E. Stern,et al.  Automatic protection switching for link failures in optical networks with bi-directional links , 1996, Proceedings of GLOBECOM'96. 1996 IEEE Global Telecommunications Conference.

[58]  Andrea Bianco,et al.  A-posteriori access strategies in all-optical slotted WDM rings , 1998, IEEE GLOBECOM 1998 (Cat. NO. 98CH36250).

[59]  M. Gerla,et al.  Protection planning in transmission networks , 1992, [Conference Record] SUPERCOMM/ICC '92 Discovering a New World of Communications.

[60]  J. J. Garcia-Luna-Aceves,et al.  Dynamics of distributed shortest-path routing algorithms , 1991, SIGCOMM '91.

[61]  Suresh Subramaniam,et al.  Survivability in optical networks , 2000, IEEE Netw..

[62]  Genghua Fan Covering Graphs by Cycles , 1992, SIAM J. Discret. Math..

[63]  Michael Burrows,et al.  Autonet: A High-Speed, Self-Configuring Local Area Network Using Point-to-Point Links , 1991, IEEE J. Sel. Areas Commun..

[64]  J. J. Shi,et al.  Interconnection of self-healing rings , 1996, Proceedings of ICC/SUPERCOMM '96 - International Conference on Communications.

[65]  Bill Jackson Hamilton cycles in regular 2-connected graphs , 1980, J. Comb. Theory, Ser. B.

[66]  J. W. Suuballe,et al.  Disjoint Paths in a Network , 2022 .

[67]  Roger Wattenhofer,et al.  The impact of Internet policy and topology on delayed routing convergence , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[68]  C. J. Green Protocols for a self-healing network , 1995, Proceedings of MILCOM '95.

[69]  Ori Gerstel,et al.  Optical layer survivability-an implementation perspective , 2000, IEEE Journal on Selected Areas in Communications.

[70]  Wayne D. Grover,et al.  Case studies of survivable ring, mesh and mesh-arc hybrid networks , 1992, [Conference Record] GLOBECOM '92 - Communications for Global Users: IEEE.

[71]  Abhijit Bose,et al.  Delayed Internet routing convergence , 2000, SIGCOMM.

[72]  W. Way,et al.  A novel passive protected SONET bidirectional self-healing ring architecture , 1992 .

[73]  Pramod K. Varshney,et al.  Design of survivable communications networks under performance constraints , 1991 .

[74]  Ching-Chir Shyur,et al.  Survivable network planning methods and tools in Taiwan , 1995 .

[75]  Eytan Modiano WDM-based packet networks , 1999, IEEE Commun. Mag..

[76]  D. R. Woodall Maximal circuits of graphs. I , 1976 .

[77]  Robert B. Magill A bandwidth efficient self-healing ring for B-ISDN , 1997, Proceedings of ICC'97 - International Conference on Communications.

[78]  C. E. Chow,et al.  Performance analysis of fast distributed network restoration algorithms , 1993, Proceedings of GLOBECOM '93. IEEE Global Telecommunications Conference.

[79]  Ramesh Bhandari Optimal diverse routing in telecommunication fiber networks , 1994, Proceedings of INFOCOM '94 Conference on Computer Communications.

[80]  Bo-Chao Cheng,et al.  A restoration methodology based on pre-planned source routing in ATM networks , 1997, Proceedings of ICC'97 - International Conference on Communications.

[81]  T.-H. Wu,et al.  Feasibility study of a high-speed SONET self-healing ring architecture in future interoffice networks , 1990, IEEE Communications Magazine.

[82]  Marco Ajmone Marsan,et al.  An almost optimal MAC protocol for all-optical WDM multi-rings with tunable transmitters and fixed receivers , 1997, Proceedings of ICC'97 - International Conference on Communications.

[83]  Floyd E. Ross,et al.  An overview of FDDI: the fiber distributed data interface , 1989, IEEE J. Sel. Areas Commun..

[84]  Marco Ajmone Marsan,et al.  All-optical WDM multi-rings with differentiated QoS , 1999, IEEE Commun. Mag..

[85]  Subrahmanyam Dravida,et al.  Fast restoration of ATM networks , 1994, IEEE J. Sel. Areas Commun..

[86]  Christopher Smith,et al.  Volume 10 , 2021, Engineering Project Organization Journal.

[87]  Ning David Lin,et al.  ATM virtual path self-healing based an a new path restoration protocol , 1994, 1994 IEEE GLOBECOM. Communications: The Global Bridge.

[88]  H. Kobrinski,et al.  Distributed control algorithms for dynamic restoration in DCS mesh networks: performance evaluation , 1993, Proceedings of GLOBECOM '93. IEEE Global Telecommunications Conference.

[89]  Richard J. Lipton,et al.  Covering Graphs by Simple Circuits , 1981, SIAM J. Comput..

[90]  Kazutaka Murakami,et al.  Virtual path routing for survivable ATM networks , 1996, TNET.

[91]  Hisaya Hadama,et al.  Implementation of self-healing function in ATM networks , 1995, Journal of Network and Systems Management.

[92]  M.J. O'Mahony,et al.  Distributed restoration strategies in telecommunications networks , 1994, Proceedings of ICC/SUPERCOMM'94 - 1994 International Conference on Communications.

[93]  Ahmed Louri,et al.  Media access protocols for a scalable optical interconnection network , 1998, Proceedings. 1998 International Conference on Parallel Processing (Cat. No.98EX205).

[94]  J. Bicknell,et al.  A fast distributed network restoration algorithm , 1993, Proceedings of Phoenix Conference on Computers and Communications.

[95]  A. Gersht,et al.  Real-time bandwidth allocation and path restorations in SONET-based self-healing mesh networks , 1993, Proceedings of ICC '93 - IEEE International Conference on Communications.

[96]  R. Jan,et al.  Topological optimization of a communication network subject to a reliability constraint , 1993 .

[97]  Satoshi Hasegawa,et al.  Distributed self-healing control in SONET , 2005, Journal of Network and Systems Management.