Improving SD-WAN Resilience: From Vertical Handoff to WAN-Aware MPTCP

Demands for wide-area connectivity between enterprise site-edge networks and central office core networks/cloud data centers have grown rapidly. Various software defined wide area network (SD-WAN) solutions have been developed with the primary aim of improving WAN link utilization. However, mechanisms used by existing SD-WAN solutions fail to provide high reliability and performance required by today’s edge to cloud applications. In this article, we present WAN-aware MPTCP which seamlessly aggregates multiple WAN links into a “big pipe” for better WAN resilience thus minimizing application performance degradation under WAN link failures. We leverage the congestion control of MPTCP to balance traffic across multiple WAN links. The key innovation is to combine LAN virtualization at end systems with WAN virtualization at SD-WAN gateways. Through evaluation in both emulated testbeds and real-world deployment, we demonstrate the performance gain of WAN-aware MPTCP in terms of resilience and throughput over existing SD-WAN solutions.

[1]  Erich M. Nahum,et al.  ECF: An MPTCP Path Scheduler to Manage Heterogeneous Paths , 2017, CoNEXT.

[2]  Martín Casado,et al.  The Design and Implementation of Open vSwitch , 2015, NSDI.

[3]  Jean Tourrilhes,et al.  Fragment adaptive reduction: coping with various interferers in radio unlicensed bands , 2001, ICC 2001. IEEE International Conference on Communications. Conference Record (Cat. No.01CH37240).

[4]  Henning Schulzrinne,et al.  Towards dynamic MPTCP Path control using SDN , 2016, 2016 IEEE NetSoft Conference and Workshops (NetSoft).

[5]  Costin Raiciu,et al.  Towards Wifi Mobility without Fast Handover , 2015, NSDI.

[6]  Amin Vahdat,et al.  B4 and after: managing hierarchy, partitioning, and asymmetry for availability and scale in google's software-defined WAN , 2018, SIGCOMM.

[7]  Mark Handley,et al.  Design, Implementation and Evaluation of Congestion Control for Multipath TCP , 2011, NSDI.

[8]  BongHwan Oh,et al.  Feedback-Based Path Failure Detection and Buffer Blocking Protection for MPTCP , 2016, IEEE/ACM Transactions on Networking.

[9]  Laurent Lefèvre,et al.  Evaluating the impact of SDN-induced frequent route changes on TCP flows , 2017, 2017 13th International Conference on Network and Service Management (CNSM).

[10]  Min Zhu,et al.  B4: experience with a globally-deployed software defined wan , 2013, SIGCOMM.

[11]  Alexander Zimmermann,et al.  Making TCP More Robust to Long Connectivity Disruptions (TCP-LCD) , 2010, RFC.

[12]  Alberto Dainotti,et al.  Blink: Fast Connectivity Recovery Entirely in the Data Plane , 2019, NSDI.

[13]  Deep Medhi,et al.  Some observations on the effect of route fluctuation and network link failure on TCP , 2001, Proceedings Tenth International Conference on Computer Communications and Networks (Cat. No.01EX495).

[14]  Walber Jose Adriano Silva Make Flows Great Again: A Hybrid Resilience Mechanism for OpenFlow Networks , 2018, Inf..

[15]  Roksana Boreli,et al.  DAPS: Intelligent delay-aware packet scheduling for multipath transport , 2014, 2014 IEEE International Conference on Communications (ICC).

[16]  Mark Handley,et al.  Improving datacenter performance and robustness with multipath TCP , 2011, SIGCOMM.

[17]  Mark Handley,et al.  Coupled Congestion Control for Multipath Transport Protocols , 2011, RFC.

[18]  Alhussein A. Abouzeid,et al.  TCP in networks with abrupt delay variations and random loss , 2001, 2001 MILCOM Proceedings Communications for Network-Centric Operations: Creating the Information Force (Cat. No.01CH37277).

[19]  Savvas Zannettou,et al.  Exploiting path diversity in datacenters using MPTCP-aware SDN , 2015, 2016 IEEE Symposium on Computers and Communication (ISCC).

[20]  Srikanth Kandula,et al.  Achieving high utilization with software-driven WAN , 2013, SIGCOMM.

[21]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[22]  Franck Le,et al.  Experiences Deploying a Transparent Split TCP Middlebox and the Implications for NFV , 2015, HotMiddlebox '15.

[23]  Fang Hao,et al.  SAMPO: Online subflow association for multipath TCP with partial flow records , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[24]  M B Reynolds Mitigating TCP Degradation over Intermittent Link Failures using Intermediate Buffers , 2006 .

[25]  Ítalo S. Cunha,et al.  Engineering Egress with Edge Fabric: Steering Oceans of Content to the World , 2017, SIGCOMM.

[26]  Koji Okamura,et al.  Fast failover mechanism for software defined networking: OpenFlow based , 2014, CFI '14.

[27]  Fernando A. Kuipers,et al.  Fast Recovery in Software-Defined Networks , 2014, 2014 Third European Workshop on Software Defined Networks.

[28]  Lixin Gao,et al.  A measurement study on the impact of routing events on end-to-end internet path performance , 2006, SIGCOMM.

[29]  Wolfgang Kellerer,et al.  Automated Bootstrapping of A Fault-Resilient In-Band Control Plane , 2020, SOSR.

[30]  Marcelo Bagnulo,et al.  Opportunistic mobility with multipath TCP , 2011, MobiArch '11.

[31]  Olivier Bonaventure,et al.  MultiPath TCP: From Theory to Practice , 2011, Networking.

[32]  Xin Wang,et al.  STMS: Improving MPTCP Throughput Under Heterogeneous Networks , 2018, USENIX Annual Technical Conference.

[33]  Ralph E. Droms,et al.  DHCP Options and BOOTP Vendor Extensions , 1993, RFC.

[34]  Puneet Sharma,et al.  Homa: An Efficient Topology and Route Management Approach in SD-WAN Overlays , 2020, IEEE INFOCOM 2020 - IEEE Conference on Computer Communications.

[35]  Mark Handley,et al.  How Hard Can It Be? Designing and Implementing a Deployable Multipath TCP , 2012, NSDI.

[36]  Sujata Banerjee,et al.  SDN and OpenFlow Evolution: A Standards Perspective , 2014, Computer.

[37]  Xiaolan Liu,et al.  MPTCP Tunnel: An Architecture for Aggregating Bandwidth of Heterogeneous Access Networks , 2018, Wirel. Commun. Mob. Comput..

[38]  Olivier Bonaventure,et al.  Multipath in the middle(box) , 2013, HotMiddlebox '13.

[39]  Stefan Savage,et al.  California fault lines: understanding the causes and impact of network failures , 2010, SIGCOMM '10.

[40]  Stewart Bryant,et al.  IP Fast Reroute Framework , 2010, RFC.

[41]  Aditya Akella,et al.  CLARINET: WAN-Aware Optimization for Analytics Queries , 2016, OSDI.

[42]  Mun Choon Chan,et al.  SQR: In-network Packet Loss Recovery from Link Failures for Highly Reliable Datacenter Networks , 2019, 2019 IEEE 27th International Conference on Network Protocols (ICNP).

[43]  Vishal Sharma,et al.  Framework for Multi-Protocol Label Switching (MPLS)-based Recovery , 2003, RFC.

[44]  Alia Atlas,et al.  Fast Reroute Extensions to RSVP-TE for LSP Tunnels , 2005, RFC.