STRID: Scalable Trigger-Based Route Incidence Diagnosis

As the Internet steadily increases in importance, it is still based on a quite fragile routing design. From network operators perspective it is therefore crucial to detect end-to- end path performance due to routing outages early to either mitigate them directly or contact other entities to mitigate them. In this work we demonstrate the feasibility of a real-time tool for detecting degraded forwarding performance due to routing problems. Our tool passively monitors the traffic within the network and actively probes paths for which the TCP traffic characteristics indicate a possible routing problem. More importantly, our tool focuses on detecting routing events that actually affect network traffic, which from the network operators' perspective is most relevant. The experimental results based on large-scale measurement in the Internet indicate that our tool effectively detects a significant number of routing outages and forwarding loops.

[1]  Ming Zhang,et al.  PlanetSeer: Internet Path Failure Monitoring and Characterization in Wide-Area Services , 2004, OSDI.

[2]  Van Jacobson,et al.  Random early detection gateways for congestion avoidance , 1993, TNET.

[3]  Christopher Krügel,et al.  Topology-Based Detection of Anomalous BGP Messages , 2003, RAID.

[4]  Sajal K. Das,et al.  Distributed Computing - IWDC 2003 , 2003, Lecture Notes in Computer Science.

[5]  Farnam Jahanian,et al.  Experimental study of Internet stability and backbone failures , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[6]  Abhijit Bose,et al.  Delayed Internet routing convergence , 2000, SIGCOMM.

[7]  Kwan-Liu Ma,et al.  Combining visual and automated data mining for near-real-time anomaly detection and analysis in BGP , 2004, VizSEC/DMSEC '04.

[8]  Sajal K. Das,et al.  Distributed Computing - IWDC 2004, 6th International Workshop, Kolkata, India, December 27-30, 2004, Proceedings , 2004, IWDC.

[9]  Balachander Krishnamurthy,et al.  On network-aware clustering of Web clients , 2000, SIGCOMM.

[10]  Lixin Gao,et al.  On Understanding Transient Interdomain Routing Failures , 2009, IEEE/ACM Transactions on Networking.

[11]  Nick Feamster,et al.  Measuring the effects of internet path faults on reactive routing , 2003, SIGMETRICS '03.

[12]  Yin Zhang,et al.  Understanding the performance of many TCP flows , 2001, Comput. Networks.

[13]  Lixin Gao,et al.  A measurement study on the impact of routing events on end-to-end internet path performance , 2006, SIGCOMM.

[14]  Joan Feigenbaum,et al.  Learning-based anomaly detection in BGP updates , 2005, MineNet '05.

[15]  V. Paxson End-to-end routing behavior in the internet , 2006, CCRV.

[16]  QUTdN QeO,et al.  Random early detection gateways for congestion avoidance , 1993, TNET.

[17]  Theodore Johnson,et al.  Gigascope: a stream database for network applications , 2003, SIGMOD '03.

[18]  Albert G. Greenberg,et al.  Combining routing and traffic data for detection of IP forwarding anomalies , 2004, SIGMETRICS '04/Performance '04.