On Understanding Transient Interdomain Routing Failures

The convergence time of the interdomain routing protocol, BGP, can last as long as 30 minutes. Yet, routing behavior during BGP route convergence is poorly understood. During route convergence, an end-to-end Internet path can experience a transient loss of reachability. We refer to this loss of reachability as transient routing failure. Transient routing failures can lead to packet losses, and prolonged packet loss bursts can make the performance of applications such as Voice-over-IP and interactive games unacceptable. In this paper, we study how routing failures can occur in the Internet. With the aid of a formal model that captures transient failures of the interdomain routing protocol, we derive the sufficient conditions that transient routing failures could occur. We further study transient routing failures in typical BGP systems where commonly used routing policies are applied. Network administrators can apply our analysis to improve their network performance and stability.

[1]  Athina Markopoulou,et al.  Characterization of failures in an IP backbone , 2004, IEEE INFOCOM 2004.

[2]  Gordon T. Wilfong,et al.  An analysis of BGP convergence properties , 1999, SIGCOMM '99.

[3]  Gordon T. Wilfong,et al.  The stable paths problem and interdomain routing , 2002, TNET.

[4]  Davor Obradovic,et al.  Real-time model and convergence time of BGP , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[5]  Farnam Jahanian,et al.  Experimental study of Internet stability and backbone failures , 1999, Digest of Papers. Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing (Cat. No.99CB36352).

[6]  Lixin Gao,et al.  A measurement study on the impact of routing events on end-to-end internet path performance , 2006, SIGCOMM 2006.

[7]  Anja Feldmann,et al.  Measuring BGP Pass-Through Times , 2004, PAM.

[8]  Nick Feamster,et al.  A model of BGP routing for network engineering , 2004, SIGMETRICS '04/Performance '04.

[9]  Jennifer Rexford,et al.  Stable internet routing without global coordination , 2001, TNET.

[10]  Matthew Roughan,et al.  Traffic Matrix Reloaded: Impact of Routing Changes , 2005, PAM.

[11]  Abhijit Bose,et al.  Delayed Internet routing convergence , 2000, SIGCOMM.

[12]  Gordon T. Wilfong,et al.  A safe path vector protocol , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[13]  Gordon T. Wilfong,et al.  Policy disputes in path-vector protocols , 1999, Proceedings. Seventh International Conference on Network Protocols.

[14]  GaoLixin On inferring autonomous system relationships in the internet , 2001 .

[15]  Albert G. Greenberg,et al.  OSPF Monitoring: Architecture, Design, and Deployment Experience , 2004, NSDI.

[16]  Ramesh Govindan,et al.  The temporal and topological characteristics of BGP path changes , 2003, 11th IEEE International Conference on Network Protocols, 2003. Proceedings..

[17]  Olivier Bonaventure,et al.  Achieving sub-second IGP convergence in large IP networks , 2005, CCRV.

[18]  J. Rexford,et al.  Network-Wide Prediction of BGP Routes , 2007, IEEE/ACM Transactions on Networking.

[19]  Anja Feldmann,et al.  Locating internet routing instabilities , 2004, SIGCOMM '04.

[20]  Jennifer Rexford,et al.  Inherently safe backup routing with BGP , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[21]  Vijay Ramachandran,et al.  Design principles of policy languages for path vector protocols , 2003, SIGCOMM '03.

[22]  Nick Feamster,et al.  Measuring the effects of internet path faults on reactive routing , 2003, SIGMETRICS '03.

[23]  Lixin Gao,et al.  A backup route aware routing protocol - fast recovery from transient routing failures , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[24]  V. Paxson End-to-end routing behavior in the internet , 2006, CCRV.

[25]  Farnam Jahanian,et al.  Internet routing instability , 1997, SIGCOMM '97.

[26]  Christophe Diot,et al.  Detection and analysis of routing loops in packet traces , 2002, IMW '02.

[27]  Daniel Massey,et al.  A study of BGP path vector route looping behavior , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[28]  Olivier Bonaventure,et al.  Achieving sub-50 milliseconds recovery upon BGP peering link failures , 2007, TNET.

[29]  Lixin Gao,et al.  Stable Internet routing without global coordination , 2000, SIGMETRICS '00.

[30]  João L. Sobrinho,et al.  Network routing with path vector protocols: theory and applications , 2003, SIGCOMM '03.

[31]  Albert G. Greenberg,et al.  Combining routing and traffic data for detection of IP forwarding anomalies , 2004, SIGMETRICS '04/Performance '04.

[32]  Chen-Nee Chuah,et al.  The impact of BGP dynamics on intra-domain traffic , 2004, SIGMETRICS '04/Performance '04.

[33]  Bruce M. Maggs,et al.  R-BGP: Staying Connected in a Connected World , 2007, NSDI.

[34]  Roger Wattenhofer,et al.  The impact of Internet policy and topology on delayed routing convergence , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).