SWIFT : Predictive Fast Reroute Technical Report

Network operators often face the problem of remote outages in transit networks leading to significant (sometimes on the order of minutes) downtimes. The issue is that BGP, the Internet routing protocol, often converges slowly upon such outages, as large bursts of messages have to be processed and propagated router by router. In this paper, we present SWIFT, a fast-reroute framework which enables routers to restore connectivity in few seconds upon remote outages. SWIFT is based on two novel techniques. First, SWIFT deals with slow outage notification by predicting the overall extent of a remote failure out of few control-plane (BGP) messages. The key insight is that significant inference speed can be gained at the price of some accuracy. Second, SWIFT introduces a new dataplane encoding scheme, which enables quick and flexible update of the affected forwarding entries. SWIFT is deployable on existing devices, without modifying BGP. We present a complete implementation of SWIFT and demonstrate that it is both fast and accurate. In our experiments with real BGP traces, SWIFT predicts the extent of a remote outage in few seconds with an accuracy of ∼90% and can restore connectivity for 99% of the affected destinations.

[1]  Olivier Bonaventure,et al.  On BGP communities , 2008, CCRV.

[2]  Russell J. Clark,et al.  SDX , 2014, SIGCOMM.

[3]  Jaideep Chandrashekar,et al.  Limiting path exploration in BGP , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[4]  Ítalo S. Cunha,et al.  PoiRoot: investigating the root cause of interdomain path changes , 2013, SIGCOMM.

[5]  Chiara Orsini,et al.  Hyperbolic graph generator , 2015, Comput. Phys. Commun..

[6]  Alia Atlas,et al.  Basic Specification for IP Fast Reroute: Loop-Free Alternates , 2008, RFC.

[7]  Albert G. Greenberg,et al.  Combining routing and traffic data for detection of IP forwarding anomalies , 2004, SIGMETRICS '04/Performance '04.

[8]  Anja Feldmann,et al.  A non-instrusive, wavelet-based approach to detecting network performance problems , 2001, IMW '01.

[9]  Stewart Bryant,et al.  IP Fast Reroute Framework , 2010, RFC.

[10]  Vyas Sekar,et al.  Internet Outages, the Eyewitness Accounts: Analysis of the Outages Mailing List , 2015, PAM.

[11]  Olivier Bonaventure,et al.  Improving Network Agility With Seamless BGP Reconfigurations , 2013, IEEE/ACM Transactions on Networking.

[12]  Abhijit Bose,et al.  Delayed Internet routing convergence , 2000, SIGCOMM.

[13]  Olaf Maennel,et al.  Route Flap Damping Made Usable , 2011, PAM.

[14]  양희영 2005 , 2005, Los 25 años de la OMC: Una retrospectiva fotográfica.

[15]  Renata Teixeira,et al.  Understanding slow BGP routing table transfers , 2009, IMC '09.

[16]  Marco Canini,et al.  FatTire: declarative fault tolerance for software-defined networks , 2013, HotSDN '13.

[17]  Arun Venkataramani,et al.  Consensus Routing: The Internet as a Distributed System. (Best Paper) , 2008, NSDI.

[18]  Timothy G. Griffin,et al.  An experimental analysis of BGP convergence time , 2001, Proceedings Ninth International Conference on Network Protocols. ICNP 2001.

[19]  Olivier Bonaventure,et al.  Avoiding disruptions during maintenance operations on BGP sessions , 2007, IEEE Transactions on Network and Service Management.

[20]  Matthew C. Caesar,et al.  Towards Localizing Root Causes of BGP Dynamics , 2003 .

[21]  Chen-Nee Chuah,et al.  Analysis of link failures in an IP backbone , 2002, IMW '02.

[22]  Ítalo S. Cunha,et al.  LIFEGUARD: practical repair of persistent route failures , 2012, SIGCOMM '12.

[23]  Lixin Gao,et al.  On inferring autonomous system relationships in the Internet , 2000, Globecom '00 - IEEE. Global Telecommunications Conference. Conference Record (Cat. No.00CH37137).

[24]  Navendu Jain,et al.  Understanding network failures in data centers: measurement, analysis, and implications , 2011, SIGCOMM.

[25]  Srihari Nelakuditi,et al.  IP fast reroute with failure inferencing , 2007, INM '07.

[26]  Dave Katz,et al.  Bidirectional Forwarding Detection (BFD) , 2010, RFC.

[27]  Stefan Savage,et al.  California fault lines: understanding the causes and impact of network failures , 2010, SIGCOMM '10.

[28]  Anja Feldmann,et al.  Locating internet routing instabilities , 2004, SIGCOMM 2004.

[29]  Bruce M. Maggs,et al.  R-BGP: Staying Connected in a Connected World , 2007, NSDI.

[30]  Lixin Gao,et al.  A measurement study on the impact of routing events on end-to-end internet path performance , 2006, SIGCOMM.

[31]  Laurent Vanbever,et al.  Central Control Over Distributed Routing , 2015, Comput. Commun. Rev..

[32]  Jennifer Neville,et al.  Prediction models for long-term Internet prefix availability , 2011, Comput. Networks.

[33]  Edith Cohen,et al.  Predicting and bypassing end-to-end internet service degradations , 2002, IMW '02.

[34]  A. Feldmann,et al.  Realistic BGP traffic for test labs , 2002, SIGCOMM '02.

[35]  Ying Zhang,et al.  A Framework for Measuring and Predicting the Impact of Routing Changes , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[36]  Alia Atlas,et al.  Fast Reroute Extensions to RSVP-TE for LSP Tunnels , 2005, RFC.

[37]  Ramesh Govindan,et al.  The temporal and topological characteristics of BGP path changes , 2003, 11th IEEE International Conference on Network Protocols, 2003. Proceedings..

[38]  Didier Colle,et al.  SRLG identification from time series analysis of link state data , 2011, 2011 Third International Conference on Communication Systems and Networks (COMSNETS 2011).

[39]  Jia Wang,et al.  Finding a needle in a haystack: pinpointing significant BGP routing changes in an IP network , 2005, NSDI.

[40]  Ítalo S. Cunha,et al.  DTRACK: A System to Predict and Track Internet Path Changes , 2014, IEEE/ACM Transactions on Networking.

[41]  Albert G. Greenberg,et al.  IP fault localization via risk modeling , 2005, NSDI.

[42]  Christophe Diot,et al.  On the correlation between route dynamics and routing loops , 2003, IMC '03.

[43]  Mark Handley,et al.  LOUP: The Principles and Practice of Intra-Domain Route Dissemination , 2013, NSDI.

[44]  Marco Canini,et al.  An Industrial-Scale Software Defined Internet Exchange Point , 2016, USENIX Annual Technical Conference.

[45]  Nick Feamster,et al.  Measuring the effects of internet path faults on reactive routing , 2003, SIGMETRICS '03.

[46]  Srikanth Kandula,et al.  Can you hear me now?!: it must be BGP , 2007, CCRV.

[47]  Amin Vahdat,et al.  Hyperbolic Geometry of Complex Networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[48]  Rüdiger Birkner,et al.  Boosting the BGP Convergence in SDXes with SWIFT , 2017, SIGCOMM Posters and Demos.

[49]  Vern Paxson,et al.  End-to-end routing behavior in the Internet , 1996, TNET.