A fine-grained multi-source measurement platform correlating routing transitions with packet losses

Abstract In this paper, we are interested in the relationship between packet losses and routing changes in an operational network. To do so we designed and deployed DCART, a monitoring platform over RENATER, the French research and education network. Our platform collects four data sources using both active and passive measurements in order to unveil their temporal correlations. Active probing allows especially for measuring packet losses on specifically crafted data flows. Those flows explore several load balanced paths and ease the revelation of forwarding loops. Passive monitoring is achieved by listening to all routing updates from IS-IS, the intra-domain routing protocol in use, and by retrieving tickets generated by the Network Operations Center (NOC). During our monitoring campaign, we observe that most of the series of loss were correlated to routing events either because routing changes lead to inconsistent state transitions, or because faulty – and so lossy – links trigger numerous periods of link flapping. In particular, we show that losses due to forwarding loops resulting from inconsistent routing states are quite common when links come back after an outage. We also show that link flapping sometimes induce very long lasting lossy periods frequently unnoticed by the NOC. A lightweight monitoring platform such as DCART could be used to better anticipate recurrent network outages and to improve the ticketing system.

[1]  Nick Feamster,et al.  Measuring the effects of internet path faults on reactive routing , 2003, SIGMETRICS '03.

[2]  Renata Teixeira,et al.  TroubleMiner: Mining network trouble tickets , 2009, 2009 IFIP/IEEE International Symposium on Integrated Network Management-Workshops.

[3]  Benoit Donnet,et al.  Revealing MPLS tunnels obscured from traceroute , 2012, CCRV.

[4]  Randy Bush,et al.  From Paris to Tokyo: on the suitability of ping to measure latency , 2013, Internet Measurement Conference.

[5]  Kin K. Leung,et al.  On optimal monitor placement for localizing node failures via network tomography , 2015, Perform. Evaluation.

[6]  Albert G. Greenberg,et al.  OSPF Monitoring: Architecture, Design, and Deployment Experience , 2004, NSDI.

[7]  Chen-Nee Chuah,et al.  Characterization of Failures in an Operational IP Backbone Network , 2008, IEEE/ACM Transactions on Networking.

[8]  Chen-Nee Chuah,et al.  Analysis of link failures in an IP backbone , 2002, IMW '02.

[9]  Stefano Vissicchio,et al.  Beyond the Best: Real-Time Non-Invasive Collection of BGP Messages , 2010, INM/WREN.

[10]  Yin Zhang,et al.  Detecting the performance impact of upgrades in large operational networks , 2010, SIGCOMM '10.

[11]  Paul Barford,et al.  Comparing probe-and router-based packet-loss measurement , 2004, IEEE Internet Computing.

[12]  Qi Zhao,et al.  Towards automated performance diagnosis in a large IPTV network , 2009, SIGCOMM '09.

[13]  Olivier Bonaventure,et al.  Achieving sub-second IGP convergence in large IP networks , 2005, CCRV.

[14]  David R. Oran,et al.  OSI IS-IS Intra-domain Routing Protocol , 1990, RFC.

[15]  Yin Zhang,et al.  Troubleshooting chronic conditions in large IP networks , 2008, CoNEXT '08.

[16]  Guillaume Urvoy-Keller,et al.  Characterizing ICMP rate limitation on routers , 2015, 2015 IEEE International Conference on Communications (ICC).

[17]  Stewart Bryant,et al.  A Framework for Loop-Free Convergence , 2010, RFC.

[18]  Manish Karir,et al.  MRT routing information export format , 2011 .

[19]  Atef Abdelkefi,et al.  An analysis of interdomain availability and causes of failures based on active measurements , 2013, Telecommun. Syst..

[20]  Stefano Vissicchio,et al.  Computing Minimal Update Sequences for Graceful Router-Wide Reconfigurations , 2015, IEEE/ACM Transactions on Networking.

[21]  Lixin Gao,et al.  A measurement study on the impact of routing events on end-to-end internet path performance , 2006, SIGCOMM 2006.

[22]  Christophe Diot,et al.  Detection and analysis of routing loops in packet traces , 2002, IMW '02.