Resilience and survivability in communication networks: Strategies, principles, and survey of disciplines

The Internet has become essential to all aspects of modern life, and thus the consequences of network disruption have become increasingly severe. It is widely recognised that the Internet is not sufficiently resilient, survivable, and dependable, and that significant research, development, and engineering is necessary to improve the situation. This paper provides an architectural framework for resilience and survivability in communication networks and provides a survey of the disciplines that resilience encompasses, along with significant past failures of the network infrastructure. A resilience strategy is presented to defend against, detect, and remediate challenges, a set of principles for designing resilient networks is presented, and techniques are described to analyse network resilience.

[1]  Liming Chen,et al.  N-VERSION PROGRAMMINC: A FAULT-TOLERANCE APPROACH TO RELlABlLlTY OF SOFTWARE OPERATlON , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[2]  B. Bhattacharjee,et al.  Postmodern Internetwork Architecture , 2006 .

[3]  James P. G. Sterbenz High-speed networking: a systematic approach to high-bandwidth low-latency communications , 2005, 13th Symposium on High Performance Interconnects (HOTI'05).

[4]  Youngseok Lee,et al.  Experience with Restoration of Asia Pacific Network Failures from Taiwan Earthquake , 2007, IEICE Trans. Commun..

[5]  Martin May,et al.  The autonomic network architecture (ANA) , 2010, IEEE Journal on Selected Areas in Communications.

[6]  A J DeBlasio,et al.  Effects of Catastrophic Events on Transportation System Management and Operations: Northridge Earthquake, January 17, 1994 , 2002 .

[7]  Anind K. Dey,et al.  Understanding and Using Context , 2001, Personal and Ubiquitous Computing.

[8]  Kenneth L. Calvert,et al.  Directions in active networks , 1998 .

[9]  Peter G. Neumann,et al.  Toward a safer and more secure cyberspace , 2007, CACM.

[10]  Jonathan S. Turner Design of an integrated services packet network , 1985, SIGCOMM '85.

[11]  Fotis Foukalas,et al.  Cross-layer design proposals for wireless mobile networks: a survey and taxonomy , 2008, IEEE Communications Surveys & Tutorials.

[12]  Carol Woody,et al.  Introduction to the OCTAVE ® Approach , 2003 .

[13]  James P. G. Sterbenz,et al.  Active network monitoring and control: the SENCOMM architecture and implementation , 2002, Proceedings DARPA Active Networks Conference and Exposition.

[14]  David D. Clark,et al.  Tussle in cyberspace: defining tomorrow's Internet , 2002, IEEE/ACM Transactions on Networking.

[15]  David Hutchison,et al.  Filters: QoS Support Mechanisms for Multipeer Communications , 1996, IEEE J. Sel. Areas Commun..

[16]  Nancy R. Mead,et al.  Survivable Network Systems: An Emerging Discipline , 1997 .

[17]  Chip Elliott,et al.  GENI - global environment for network innovations , 2008, LCN.

[18]  Jan M. Rabaey,et al.  Energy aware routing for low energy ad hoc sensor networks , 2002, 2002 IEEE Wireless Communications and Networking Conference Record. WCNC 2002 (Cat. No.02TH8609).

[19]  James P. G. Sterbenz,et al.  Multipath at the transport layer: An end-to-end resilience mechanism , 2009, 2009 International Conference on Ultra Modern Telecommunications & Workshops.

[20]  David Clark,et al.  New ARCH: Future Generation Internet Architecture , 2004 .

[21]  Steven McCanne,et al.  Towards an evolvable internet architecture , 2005, SIGCOMM '05.

[22]  A. Antonopoulos Metrication and performance analysis on resilience of ring-based transport network solutions , 1999, Seamless Interconnection for Universal Services. Global Telecommunications Conference. GLOBECOM'99. (Cat. No.99CH37042).

[23]  David D. Clark,et al.  Rethinking the design of the Internet , 2001, ACM Trans. Internet Techn..

[24]  Andrew T. Campbell,et al.  A survey of QoS architectures , 1998, Multimedia Systems.

[25]  Algirdas Avizienis,et al.  Design of fault-tolerant computers , 1967, AFIPS '67 (Fall).

[26]  Justin P. Rohrer,et al.  Cross-layer architectural framework for highly-mobile multihop airborne telemetry networks , 2008, MILCOM 2008 - 2008 IEEE Military Communications Conference.

[27]  Reijo Savola,et al.  Towards a taxonomy for information security metrics , 2007, QoP '07.

[28]  Rabat Anam Mahmood,et al.  Simulating Challenges to Communication Networks for Evaluation of Resilience , 2009 .

[29]  Soung Chang Liew,et al.  A framework for characterizing disaster-based network survivability , 1994, IEEE J. Sel. Areas Commun..

[30]  Charles Babbage,et al.  Babbage's Calculating Engines : Being a Collection of Papers Relating to Them; Their History and Construction , 2010 .

[31]  Kevin R. Fall,et al.  A delay-tolerant network architecture for challenged internets , 2003, SIGCOMM '03.

[32]  Balachander Krishnamurthy,et al.  Flash crowds and denial of service attacks: characterization and implications for CDNs and web sites , 2002, WWW.

[33]  Robert J Hermann,et al.  Report of the Commission to Assess the Threat to the United States from Electromagnetic Pulse (EMP) Attack: Critical National Infrastructures , 2008 .

[34]  Ian F. Akyildiz,et al.  Wireless sensor networks: a survey , 2002, Comput. Networks.

[35]  John G. Gruber,et al.  Performance Requirements for Integrated Voice/Data Networks , 1983, IEEE J. Sel. Areas Commun..

[36]  Michael Menth,et al.  Loop-free alternates and not-via addresses: A proper combination for IP fast reroute? , 2010, Comput. Networks.

[37]  D. Richard Kuhn,et al.  Sources of Failure in the Public Switched Telephone Network , 1997, Computer.

[38]  Michael Luby,et al.  A digital fountain approach to reliable distribution of bulk data , 1998, SIGCOMM '98.

[39]  B.E. Helvik,et al.  Dependability modelling and analysis of networks as taking routing and traffic into account , 2006, 2006 2nd Conference on Next Generation Internet Design and Engineering, 2006. NGI '06..

[40]  Deep Medhi,et al.  Multi-layered network survivability-models, analysis, architecture, framework and implementation: an overview , 2000, Proceedings DARPA Information Survivability Conference and Exposition. DISCEX'00.

[41]  R. Srikant,et al.  Multi-Path TCP: A Joint Congestion Control and Routing Scheme to Exploit Path Diversity in the Internet , 2006, IEEE/ACM Transactions on Networking.

[42]  Anthony J. McAuley,et al.  Reliable broadband communication using a burst erasure correcting code , 1990, SIGCOMM '90.

[43]  Rayford B. Vaughn,et al.  Information assurance measures and metrics - state of practice and proposed taxonomy , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[44]  Byrav Ramamurthy,et al.  The Great Plains Environment for Network Innovation (GpENI): A Programmable Testbed for Future Internet Architecture Research , 2010, TRIDENTCOM.

[45]  Jason A. Papin Robust Design: A Repertoire of Biological, Ecological, and Engineering Case Studies.Santa Fe Institute Studies in the Sciences of Complexity.Edited byErica Jen.Oxford and New York: Oxford University Press.$74.50 (hardcover); $44.50 (paper). x + 295 p; ill.; index. ISBN: 0‐19‐516532‐2 (hc); 0‐19‐5165 , 2006 .

[46]  Matt Bishop,et al.  What Is Computer Security? , 2003, IEEE Secur. Priv..

[47]  Juan P. Fernández Palacios,et al.  A reliability analysis of Double-Ring topologies with Dual Attachment using p-cycles for optical metro networks , 2010, Comput. Networks.

[48]  Victor S. Frost,et al.  Performance Comparison of Weather Disruption-Tolerant Cross-Layer Routing Algorithms , 2009, IEEE INFOCOM 2009.

[49]  Mikhail Prokopenko,et al.  An information-theoretic primer on complexity, self-organization, and emergence , 2009 .

[50]  David Hutchison,et al.  Poster : Towards Quantifying Metrics for Resilient and Survivable Networks , 2006 .

[51]  Yan Gao,et al.  HiFIND: A high-speed flow-level intrusion detection approach with DoS resiliency , 2010, Comput. Networks.

[52]  James P. Peerenboom,et al.  Identifying, understanding, and analyzing critical infrastructure interdependencies , 2001 .

[53]  Santosh K. Shrivastava,et al.  Reliable Computer Systems , 1985, Texts and Monographs in Computer Science.

[54]  Shahram Shah-Heydari,et al.  Network survivability in large-scale regional failure scenarios , 2009, C3S2E '09.

[55]  S. Gjessing,et al.  Multiple Routing Configurations for Fast IP Network Recovery , 2009, IEEE/ACM Transactions on Networking.

[56]  A J DeBlasio,et al.  Effects of Catastrophic Events on Transportation System Management and Operations: August 2003 Northeast Blackout, New York City , 2004 .

[57]  Stefan Savage,et al.  Proceedings of the ACM SIGCOMM 2008 conference on Data communication , 2008, SIGCOMM 2008.

[58]  John F. Meyer,et al.  Performability: A Retrospective and Some Pointers to the Future , 1992, Perform. Evaluation.

[59]  Jeffrey O. Kephart,et al.  Research challenges of autonomic computing , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[60]  David B. Johnson,et al.  Routing in Ad Hoc Networks of Mobile Hosts , 1994, 1994 First Workshop on Mobile Computing Systems and Applications.

[61]  Walter Willinger,et al.  Robustness and the Internet: Design and evolution , 2002 .

[62]  David Hutchison,et al.  On realising a strategy for resilience in opportunistic networks , 2010, 2010 Future Network & Mobile Summit.

[63]  Hari Balakrishnan,et al.  Improving loss resilience with multi-radio diversity in wireless networks , 2005, MobiCom '05.

[64]  Yu Wang,et al.  Routing in vehicular ad hoc networks: A survey , 2007, IEEE Vehicular Technology Magazine.

[65]  Poseidon House,et al.  Building dependable distributed systems , 1994 .

[66]  Thomas E. Stern,et al.  Automatic protection switching for link failures in optical networks with bi-directional links , 1996, Proceedings of GLOBECOM'96. 1996 IEEE Global Telecommunications Conference.

[67]  Joseph D. Touch,et al.  High-speed networking: a systematic approach to high-bandwidth low-latency communication , 2001, Proceedings. 12th Annual IEEE Symposium on High Performance Interconnects.

[68]  Robert W. Shirey,et al.  Internet Security Glossary, Version 2 , 2007, RFC.

[69]  Jörg Ott,et al.  Drive-thru Internet: IEEE 802.11b for "automobile" users , 2004, IEEE INFOCOM 2004.

[70]  Antonio Alfredo Ferreira Loureiro,et al.  On the design of resilient heterogeneous wireless sensor networks based on small world concepts , 2010, Comput. Networks.

[71]  Salim Hariri,et al.  Impact Analysis of Faults and Attacks in Large-Scale Networks , 2003, IEEE Secur. Priv..

[72]  Charles E. Perkins,et al.  Highly dynamic Destination-Sequenced Distance-Vector routing (DSDV) for mobile computers , 1994, SIGCOMM.

[73]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[74]  Kelley Klaver Pecheux,et al.  EFFECTS OF CATASTROPHIC EVENTS ON TRANSPORTATION SYSTEM MANAGEMENT AND OPERATIONS, HOWARD STREET TUNNEL FIRE, BALTIMORE CITY, MARYLAND, JULY 18, 2001: FINDINGS , 2002 .

[75]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[76]  Robert C. Durst,et al.  TCP extensions for space communications , 1996, MobiCom '96.

[77]  Brian Gallagher,et al.  MaxProp: Routing for Vehicle-Based Disruption-Tolerant Networks , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[78]  John Moy Experience with the OSPF Protocol , 1991, RFC.

[79]  David D. Clark,et al.  FARA: reorganizing the addressing architecture , 2003, FDNA '03.

[80]  Vinton G. Cerf,et al.  Delay-tolerant networking: an approach to interplanetary Internet , 2003, IEEE Commun. Mag..

[81]  Frank Swiderski,et al.  Threat Modeling , 2018, Hacking Connected Cars.

[82]  S.M. Bellovin,et al.  Network firewalls , 1994, IEEE Communications Magazine.

[83]  Angela Chiu,et al.  Issues for routing in the optical layer , 2001, IEEE Commun. Mag..

[84]  Aaron D. Wyner,et al.  Reliable Circuits Using Less Reliable Relays , 1993 .

[85]  LeskMichael The New Front Line , 2007, S&P 2007.

[86]  Vinton G. Cerf,et al.  The DoD Internet Architecture Model , 1983, Comput. Networks.

[87]  Peter Reiher,et al.  A taxonomy of DDoS attack and DDoS defense mechanisms , 2004, CCRV.

[88]  Kishor S. Trivedi,et al.  Survivability analysis of telephone access network , 2004, 15th International Symposium on Software Reliability Engineering.

[89]  Kevin J. Sullivan,et al.  Towards a rigorous definition of information system survivability , 2003, Proceedings DARPA Information Survivability Conference and Exposition.

[90]  Hermann Kopetz,et al.  Fault tolerance, principles and practice , 1990 .

[91]  Hermann Kopetz,et al.  Dependability: Basic Concepts and Terminology , 1992 .

[92]  A. Zolfaghari,et al.  Network Survivability Performance , 1993 .

[93]  David Levin,et al.  Survivable mobile wireless networks: issues, challenges, and research directions , 2002, WiSE '02.

[94]  S. Low,et al.  The "robust yet fragile" nature of the Internet. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[95]  K. W. Lu,et al.  A framework for network survivability characterization , 1992, [Conference Record] SUPERCOMM/ICC '92 Discovering a New World of Communications.

[96]  Gregory Lauer,et al.  Hierarchical Routing for Very Large Networks , 1984, MILCOM 1984 - IEEE Military Communications Conference.

[97]  Stefan Savage,et al.  The Phoenix Recovery System: Rebuilding from the Ashes of an Internet Catastrophe , 2003, HotOS.

[98]  David D. Clark,et al.  A knowledge plane for the internet , 2003, SIGCOMM '03.

[99]  Malgorzata Steinder,et al.  A survey of fault localization techniques in computer networks , 2004, Sci. Comput. Program..

[100]  Brian Randell,et al.  System structure for software fault tolerance , 1975, IEEE Transactions on Software Engineering.

[101]  Robert S. Swarz,et al.  Reliable Computer Systems: Design and Evaluation , 1992 .

[102]  Fred J. Kaudel,et al.  Framework for network survivability performance , 1994, IEEE J. Sel. Areas Commun..

[103]  Wayne D. Grover,et al.  Availability analysis of span-restorable mesh networks , 2002, IEEE J. Sel. Areas Commun..

[104]  David Hutchison,et al.  From Detection to Remediation: A Self-Organized System for Addressing Flash Crowd Problems , 2008, 2008 IEEE International Conference on Communications.

[105]  Erica Jen,et al.  Robust design : a repertoire of biological, ecological, and engineering case studies , 2005 .

[106]  Wojciech Molisz Survivability function - a measure of disaster-based routing performance , 2004, IEEE Journal on Selected Areas in Communications.

[107]  Craig Partridge,et al.  The Internet Under Crisis Conditions: Learning from September 11 , 2003 .

[108]  Zhi-Li Zhang,et al.  Profiling internet backbone traffic: behavior models and applications , 2005, SIGCOMM '05.

[109]  Michael H. Behringer Classifying network complexity , 2009, ReArch '09.

[110]  Jon Crowcroft,et al.  Plutarch: an argument for network pluralism , 2003, FDNA '03.

[111]  Robert W. Shirey,et al.  Internet Security Glossary , 2000, RFC.

[112]  David Hutchison,et al.  On Understanding Normal Protocol Behaviour to Monitor and Mitigate the Abnormal , 2006 .

[113]  Ted Taekyoung Kwon,et al.  An adaptive peer-to-peer live streaming system with incentives for resilience , 2010, Comput. Networks.

[114]  Thrasyvoulos Spyropoulos,et al.  On Leveraging Partial Paths in Partially-Connected Networks , 2009, IEEE INFOCOM 2009.

[115]  Kishor S. Trivedi,et al.  Survivability Quantification of Real-Sized Networks Including End-to-End Delay Distributions , 2008, 2008 Third International Conference on Systems and Networks Communications.

[116]  Michael E. Lesk,et al.  The New Front Line: Estonia under Cyberassault , 2007, IEEE Security & Privacy.

[117]  S. M. Heemstra de Groot,et al.  Power-aware routing in mobile ad hoc networks , 1998, MobiCom '98.

[118]  Wayne D. Grover,et al.  Cycle-oriented distributed preconfiguration: ring-like speed with mesh-like capacity for self-planning network restoration , 1998, ICC '98. 1998 IEEE International Conference on Communications. Conference Record. Affiliated with SUPERCOMM'98 (Cat. No.98CH36220).

[119]  G. Pacifici,et al.  Control of resources in broadband networks with quality of service guarantees , 1991, IEEE Communications Magazine.

[120]  William H. Sanders,et al.  Model-based evaluation: from dependability to security , 2004, IEEE Transactions on Dependable and Secure Computing.

[121]  Amin Vahdat,et al.  Epidemic Routing for Partially-Connected Ad Hoc Networks , 2009 .

[122]  James P. G. Sterbenz,et al.  Networking Requirements for Interactive Video on Demand , 1995, IEEE J. Sel. Areas Commun..

[123]  Salim Hariri,et al.  A Framework for Network Vulnerability Analysis , 2002, Communications, Internet, and Information Technology.

[124]  Abdul Jabbar,et al.  Path diversification: A multipath resilience mechanism , 2009, 2009 7th International Workshop on Design of Reliable Communication Networks.

[125]  Piet Demeester,et al.  Resilience in multilayer networks , 1999, IEEE Commun. Mag..

[126]  Mark Crovella,et al.  Mining anomalies using traffic feature distributions , 2005, SIGCOMM '05.

[127]  Franco Zambonelli,et al.  A survey of autonomic communications , 2006, TAAS.

[128]  Gregory D. Abowd,et al.  The context toolkit: aiding the development of context-enabled applications , 1999, CHI '99.

[129]  Dong Seong Kim,et al.  Resilience in computer systems and networks , 2009, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[130]  Kishor S. Trivedi,et al.  Survivability quantification of communication services , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[131]  Andrew T. Campbell,et al.  A quality of service architecture , 1994, CCRV.

[132]  Rajesh Krishnan,et al.  Optimization algorithms for large self-structuring networks , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[133]  Santosh S. Vempala,et al.  Path splicing , 2008, SIGCOMM '08.

[134]  Robert E. Lyons,et al.  The Use of Triple-Modular Redundancy to Improve Computer Reliability , 1962, IBM J. Res. Dev..

[135]  F. Schneider Trust in Cyberspace , 1998 .

[136]  Bjarne E. Helvik,et al.  A survey of resilience differentiation frameworks in communication networks , 2007, IEEE Communications Surveys & Tutorials.

[137]  Kishor S. Trivedi,et al.  Network survivability modeling , 2009, Comput. Networks.

[138]  R. Martin,et al.  Resilience Analysis of Packet-Switched Communication Networks , 2009, IEEE/ACM Transactions on Networking.

[139]  Richard D. Gitlin,et al.  Diversity coding: using error control for self-healing in communication networks , 1990, Proceedings. IEEE INFOCOM '90: Ninth Annual Joint Conference of the IEEE Computer and Communications Societies@m_The Multiple Facets of Integration.

[140]  Ellen W. Zegura,et al.  A message ferrying approach for data delivery in sparse mobile ad hoc networks , 2004, MobiHoc '04.

[141]  Qun Li,et al.  Sending messages to mobile users in disconnected ad-hoc wireless networks , 2000, MobiCom '00.

[142]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[143]  Deep Medhi,et al.  Dependability and security models , 2009, 2009 7th International Workshop on Design of Reliable Communication Networks.