Engineering Egress with Edge Fabric: Steering Oceans of Content to the World

Large content providers build points of presence around the world, each connected to tens or hundreds of networks. Ideally, this connectivity lets providers better serve users, but providers cannot obtain enough capacity on some preferred peering paths to handle peak traffic demands. These capacity constraints, coupled with volatile traffic and performance and the limitations of the 20 year old BGP protocol, make it difficult to best use this connectivity. We present Edge Fabric, an SDN-based system we built and deployed to tackle these challenges for Facebook, which serves over two billion users from dozens of points of presence on six continents. We provide the first public details on the connectivity of a provider of this scale, including opportunities and challenges. We describe how Edge Fabric operates in near real-time to avoid congesting links at the edge of Facebook's network. Our evaluation on production traffic worldwide demonstrates that Edge Fabric efficiently uses interconnections without congesting them and degrading performance. We also present real-time performance measurements of available routes and investigate incorporating them into routing decisions. We relate challenges, solutions, and lessons from four years of operating and evolving Edge Fabric.

[1]  Arun Venkataramani,et al.  iPlane: an information plane for distributed services , 2006, OSDI '06.

[2]  Farnam Jahanian,et al.  Internet inter-domain traffic , 2010, SIGCOMM '10.

[3]  Albert G. Greenberg,et al.  Optimizing Cost and Performance in Online Service Provider Networks , 2010, NSDI.

[4]  Vyas Sekar,et al.  Understanding the impact of video quality on user engagement , 2011, SIGCOMM.

[5]  Anja Feldmann,et al.  Anatomy of a large european IXP , 2012, SIGCOMM '12.

[6]  Dan Wing,et al.  Happy Eyeballs: Success with Dual-Stack Hosts , 2012, RFC.

[7]  Nick Feamster,et al.  Quantifying the benefits of joint content and network routing , 2013, SIGMETRICS '13.

[8]  Srikanth Kandula,et al.  Achieving high utilization with software-driven WAN , 2013, SIGCOMM.

[9]  Vasileios Giotsas,et al.  AS relationships, customer cones, and validation , 2013, Internet Measurement Conference.

[10]  Min Zhu,et al.  B4: experience with a globally-deployed software defined wan , 2013, SIGCOMM.

[11]  Ramesh Govindan,et al.  Reducing web latency: the virtue of gentle aggression , 2013, SIGCOMM.

[12]  Ramesh Govindan,et al.  Mapping the expansion of Google's serving infrastructure , 2013, Internet Measurement Conference.

[13]  Min Zhu,et al.  WCMP: weighted cost multipathing for improved fairness in data centers , 2014, EuroSys '14.

[14]  Ítalo S. Cunha,et al.  PEERING: An AS for Us , 2014, HotNets.

[15]  Ratul Mahajan,et al.  Analyzing the Performance of an Anycast CDN , 2015, Internet Measurement Conference.

[16]  Ramesh Govindan,et al.  Are We One Hop Away from a Better Internet? , 2015, Internet Measurement Conference.

[17]  Laurent Vanbever,et al.  Central Control Over Distributed Routing , 2015, Comput. Commun. Rev..

[18]  Jie Liu,et al.  FastRoute: A Scalable Load-Aware Anycast Routing Architecture for Modern CDNs , 2015, NSDI.

[19]  Shichang Xu,et al.  Mobilyzer: An Open Platform for Controllable Mobile Network Measurements , 2015, MobiSys.

[20]  Ramesh K. Sitaraman,et al.  End-User Mapping: Next Generation Request Routing for Content Delivery , 2015, Comput. Commun. Rev..

[21]  Ramesh Govindan,et al.  An Internet-Wide Analysis of Traffic Policing , 2016, SIGCOMM.

[22]  Nick Feamster,et al.  Revealing Utilization at Internet Interconnection Points , 2016, ArXiv.

[23]  Hongyi Zeng,et al.  Robotron: Top-down Network Management at Facebook Scale , 2016, SIGCOMM.

[24]  Ming Zhang,et al.  Efficiently Delivering Online Services over Integrated Infrastructure , 2016, NSDI.

[25]  Vladimir Braverman,et al.  One Sketch to Rule Them All: Rethinking Network Flow Monitoring with UnivMon , 2016, SIGCOMM.

[26]  Marco Canini,et al.  An Industrial-Scale Software Defined Internet Exchange Point , 2016, USENIX Annual Technical Conference.

[27]  Aditya Akella,et al.  Bootstrapping evolvability for inter-domain routing with D-BGP , 2017, SIGCOMM.

[28]  Kok-Kiong Yap,et al.  Taking the Edge off with Espresso: Scale, Reliability and Programmability for Global Internet Peering , 2017, SIGCOMM.