Network resilience: a systematic approach

The cost of failures within communication networks is significant and will only increase as their reach further extends into the way our society functions. Some aspects of network resilience, such as the application of fault-tolerant systems techniques to optical switching, have been studied and applied to great effect. However, networks - and the Internet in particular - are still vulnerable to malicious attacks, human mistakes such as misconfigurations, and a range of environmental challenges. We argue that this is, in part, due to a lack of a holistic view of the resilience problem, leading to inappropriate and difficult-to-manage solutions. In this article, we present a systematic approach to building resilient networked systems. We first study fundamental elements at the framework level such as metrics, policies, and information sensing mechanisms. Their understanding drives the design of a distributed multilevel architecture that lets the network defend itself against, detect, and dynamically respond to challenges. We then use a concrete case study to show how the framework and mechanisms we have developed can be applied to enhance resilience.

[1]  Jorge Lobo,et al.  Policy ratification , 2005, Sixth IEEE International Workshop on Policies for Distributed Systems and Networks (POLICY'05).

[2]  Georg Carle,et al.  A cooperative SIP infrastructure for highly reliable telecommunication services , 2007, IPTComm '07.

[3]  Shi Qian,et al.  Evaluation of network resilience, survivability, and disruption tolerance: analysis, topology generation, simulation, and experimentation , 2013, Telecommun. Syst..

[4]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[5]  Javier Martín Hernández,et al.  A Computational Approach to Multi-level Analysis of Network Resilience , 2010, 2010 Third International Conference on Dependability.

[6]  Piet Van Mieghem,et al.  Protecting Against Network Infections: A Game Theoretic Perspective , 2009, IEEE INFOCOM 2009.

[7]  David Hutchison,et al.  Strategies for Network Resilience: Capitalising on Policies , 2010, AIMS.

[8]  Johannes Lessmann,et al.  Rope ladder routing: Position-based multipath routing for wireless mesh networks , 2010, 2010 IEEE International Symposium on "A World of Wireless, Mobile and Multimedia Networks" (WoWMoM).

[9]  Randy H. Katz,et al.  X-Trace: A Pervasive Network Tracing Framework , 2007, NSDI.

[10]  Éric Gourdin,et al.  A Mixed Integer Model for the Sparsest Cut problem , 2010, Electron. Notes Discret. Math..

[11]  Bjarne E. Helvik,et al.  A survey of resilience differentiation frameworks in communication networks , 2007, IEEE Communications Surveys & Tutorials.

[12]  Merkourios Karaliopoulos,et al.  On maximizing collaboration in Wireless Mesh Networks without monetary incentives , 2010, 8th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks.

[13]  David Hutchison,et al.  Resilience and survivability in communication networks: Strategies, principles, and survey of disciplines , 2010, Comput. Networks.

[14]  Marie-Odile Cordier,et al.  Alarm Driven Monitoring Based on Chronicles , 2000 .