Effects of Partial Topology on Fault Diagnosis

Network components may experience faults for a variety of reasons, but it may not be immediately obvious which component failed. Fault diagnosis algorithms are required to localize failures and thereby enable the recovery process. Most current state of the art fault diagnosis algorithms assume full knowledge of the network topology, which may not be available in real scenarios. In this paper we examine the performance of one of these fault diagnosis algorithms, namely Max-Coverage (MC), when the topology is only partially known. We introduce a simple extension, called the Virtual Topology (VT), to correctly identify faults when a failure occurs in an unobserved component. We compare the performance of MC under partial topology knowledge with and without this extension to show that VT significantly improves correct diagnosis, but at the cost of a high number of false positives. Moreover, we demonstrate that correctly inferring areas of the unobserved network substantially mitigates the drawbacks associated with using VT.

[1]  Srikanth Kandula,et al.  Shrink: a tool for failure diagnosis in IP networks , 2005, MineNet '05.

[2]  Ratul Mahajan,et al.  Measuring ISP topologies with rocketfuel , 2002, TNET.

[3]  Ananthram Swami,et al.  Adaptive algorithms for diagnosing large-scale failures in computer networks , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[4]  Yajun Wang,et al.  Network Topology Inference Based on End-to-End Measurements , 2006, IEEE Journal on Selected Areas in Communications.

[5]  Ramesh Govindan,et al.  Heuristics for Internet map discovery , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[6]  Athina Markopoulou,et al.  Characterization of failures in an IP backbone , 2004, IEEE INFOCOM 2004.

[7]  Albert G. Greenberg,et al.  IP fault localization via risk modeling , 2005, NSDI.

[8]  Albert G. Greenberg,et al.  Detection and Localization of Network Black Holes , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[9]  Azer Bestavros,et al.  On the Marginal Utility of Deploying Measurement Infrastructure , 2000 .

[10]  Ananthram Swami,et al.  netCSI: A Generic Fault Diagnosis Algorithm for Large-Scale Failures in Computer Networks , 2011, IEEE Transactions on Dependable and Secure Computing.

[11]  Fangzhe Chang,et al.  Topology inference in the presence of anonymous routers , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[12]  Malgorzata Steinder,et al.  A survey of fault localization techniques in computer networks , 2004, Sci. Comput. Program..