Towards Root Cause Analysis of Internet Routing Dynamics

The lack of a good understanding of the dynamics of interdomain routing has made efforts to address BGP’s shortcomings a black art. To gain more insight into these dynamics, we need to answer two questions: What is the cause of a routing change? Where does a routing change originate? This paper proposes the design of a BGP health inferencing system that answers these questions by observing routing updates from multiple vantage points and inferring the type and location of an event that triggers a routing change. To build such a system, we solve two basic problems: (a) classify route updates into groups of correlated routing changes where all route updates in a group are triggered by the same event, (b) given the set of routing changes for an event, determine the location and the cause of the event. By analyzing route updates from Routeviews and RIPE for over months, we found that our approach can pinpoint the location where an update is triggered to a single inter-AS link in over 70% of observed updates. We found that the majority of updates are caused by a relatively small number of unstable links. In addition, 25% of prefixes are persistently unstable, causing 20% of all updates observed. Routes through the Internet core usually reconverge quickly after events, while an event taking place at the network edge is 9 times more likely to cause a long-term route change. We validated our approach by showing it can detect a variety of well-known events, namely: (a) session resets recorded in the NANOG mailing list; (b) routing problems within ISPs; (c) location of BGP Beacons. In addition, our inference methodology is able to detect several routing problems not publicly known. In summary, we believe that our health inference system is a first step towards forming a better understanding of inter-domain routing dynamics.

[1]  Yakov Rekhter,et al.  A Border Gateway Protocol 4 (BGP-4) , 1994, RFC.

[2]  Bassam Halabi,et al.  Internet Routing Architectures , 1997 .

[3]  Farnam Jahanian,et al.  Internet routing instability , 1997, SIGCOMM '97.

[4]  Gordon T. Wilfong,et al.  An analysis of BGP convergence properties , 1999, SIGCOMM '99.

[5]  Deborah Estrin,et al.  Persistent route oscillations in inter-domain routing , 2000, Comput. Networks.

[6]  Daniel Massey,et al.  An analysis of BGP multiple origin AS (MOAS) conflicts , 2001, IMW '01.

[7]  Roger Wattenhofer,et al.  The impact of Internet policy and topology on delayed routing convergence , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[8]  Ratul Mahajan,et al.  Understanding BGP misconfiguration , 2002, SIGCOMM '02.

[9]  G. Wilfong,et al.  On the correctness of IBGP configuration , 2002, SIGCOMM '02.

[10]  George Varghese,et al.  Route flap damping exacerbates internet routing convergence , 2002, SIGCOMM '02.

[11]  Daniel Massey,et al.  Observation and analysis of BGP behavior under stress , 2002, IMW '02.

[12]  Yin Zhang,et al.  BGP routing stability of popular destinations , 2002, IMW '02.

[13]  Nick Feamster,et al.  Topology inference from BGP routing dynamics , 2002, IMW '02.

[14]  A. Feldmann,et al.  Realistic BGP traffic for test labs , 2002, SIGCOMM '02.

[15]  Route oscillations in I-BGP with route reflection , 2002, SIGCOMM '02.

[16]  Randy H. Katz,et al.  Characterizing the Internet hierarchy from multiple vantage points , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[17]  Matthew Roughan,et al.  BGP beacons , 2003, IMC '03.

[18]  Nick Feamster,et al.  Guidelines for interdomain traffic engineering , 2003, CCRV.

[19]  Helen J. Wang,et al.  Server-based inference of Internet link lossiness , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[20]  Ramesh Govindan,et al.  The temporal and topological characteristics of BGP path changes , 2003, 11th IEEE International Conference on Network Protocols, 2003. Proceedings..

[21]  Nick Feamster,et al.  Measuring the effects of internet path faults on reactive routing , 2003, SIGMETRICS '03.

[22]  Ratul Mahajan,et al.  Measuring ISP topologies with Rocketfuel , 2004, IEEE/ACM Transactions on Networking.