Practical issues with using network tomography for fault diagnosis

This paper investigates the practical issues in applying network tomography to monitor failures. We outline an approach for selecting paths to monitor, detecting and confirming the existence of a failure, correlating multiple independent observations into a single failure event, and applying existing binary networking tomography algorithms to identify failures. We evaluate the ability of network tomography algorithms to correctly detect and identify failures in a controlled environment on the VINI testbed.

[1]  Renata Teixeira,et al.  NetDiagnoser: troubleshooting network unreachabilities using end-to-end probes and routing data , 2007, CoNEXT '07.

[2]  Donald F. Towsley,et al.  Inferring link loss using striped unicast probes , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[3]  Robert Nowak,et al.  Network Tomography: Recent Developments , 2004 .

[4]  Mischa Schwartz,et al.  ACM SIGCOMM computer communication review , 2001, CCRV.

[5]  David Wetherall,et al.  Studying Black Holes in the Internet with Hubble , 2008, NSDI.

[6]  Patrick Thiran,et al.  Active Measurement for Multiple Link Failures Diagnosis in IP Networks , 2004, PAM.

[7]  Malgorzata Steinder,et al.  Probabilistic fault localization in communication systems using belief networks , 2004, IEEE/ACM Transactions on Networking.

[8]  Nick G. Duffield,et al.  Network Tomography of Binary Network Performance Characteristics , 2006, IEEE Transactions on Information Theory.

[9]  Patrick Thiran,et al.  Network loss inference with second order statistics of end-to-end flows , 2007, IMC '07.

[10]  Srikanth Kandula,et al.  Shrink: a tool for failure diagnosis in IP networks , 2005, MineNet '05.

[11]  Donald F. Towsley,et al.  Multicast-based inference of network-internal characteristics: accuracy of packet loss estimation , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[12]  Ming Zhang,et al.  PlanetSeer: Internet Path Failure Monitoring and Characterization in Wide-Area Services , 2004, OSDI.

[13]  Nick Feamster,et al.  In VINI veritas: realistic and controlled network experimentation , 2006, SIGCOMM 2006.

[14]  R. Caceres,et al.  Inference of internal loss rates in the MBone , 1999, Seamless Interconnection for Universal Services. Global Telecommunications Conference. GLOBECOM'99. (Cat. No.99CH37042).

[15]  Albert G. Greenberg,et al.  Detection and Localization of Network Black Holes , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[16]  Mischa Schwartz,et al.  Schemes for fault identification in communication networks , 1995, TNET.