Managing Recurrent Virtual Network Updates in Multi-Tenant Datacenters: A System Perspective

With the advent of software-defined networking, network configuration through programmable interfaces becomes practical, leading to various on-demand opportunities for network routing update in multi-tenant datacenters, where tenants have diverse requirements on network routings such as short latency, low path inflation, large bandwidth, high reliability, etc. Conventional solutions that rely on topology search coupled with an objective function to find desired routings have at least two shortcomings: ${\sf (i)}$(i) they run into scalability issues when handling consistent and frequent routing updates and ${\sf (ii)}$(ii) they restrict the flexibility and capability to satisfy various routing requirements. To address these issues, this paper proposes a novel search and optimization decoupled design, which not only saves considerable topology search costs via search result reuse, but also avoids possible sub-optimality in greedy routing search algorithms by making decisions based on the global view of all possible routings. We implement a prototype of our proposed system, OpReduce, and perform extensive evaluations to validate its design goals.

[1]  Amin Vahdat,et al.  PortLand: a scalable fault-tolerant layer 2 data center network fabric , 2009, SIGCOMM '09.

[2]  A. Rowstron,et al.  Towards predictable datacenter networks , 2011, SIGCOMM.

[3]  Sujata Banerjee,et al.  Application-driven bandwidth guarantees in datacenters , 2014, SIGCOMM.

[4]  Srikanth Kandula,et al.  Dynamic load balancing without packet reordering , 2007, CCRV.

[5]  Guofei Gu,et al.  A First Step Toward Network Security Virtualization: From Concept To Prototype , 2015, IEEE Transactions on Information Forensics and Security.

[6]  Robert N. M. Watson,et al.  Queues Don't Matter When You Can JUMP Them! , 2015, NSDI.

[7]  D. Zats,et al.  DeTail: reducing the flow completion time tail in datacenter networks , 2012, CCRV.

[8]  Jeffrey C. Mogul,et al.  NetLord: a scalable multi-tenant network architecture for virtualized datacenters , 2011, SIGCOMM.

[9]  Srikanth Kandula,et al.  Achieving high utilization with software-driven WAN , 2013, SIGCOMM.

[10]  Brighten Godfrey,et al.  Finishing flows quickly with preemptive scheduling , 2012, CCRV.

[11]  Rob Sherwood,et al.  Can the Production Network Be the Testbed? , 2010, OSDI.

[12]  T. N. Vijaykumar,et al.  Deadline-aware datacenter tcp (D2TCP) , 2012, SIGCOMM '12.

[13]  Ying Zhang,et al.  Providing bandwidth guarantees, work conservation and low latency simultaneously in the cloud , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[14]  David Eppstein,et al.  Finding the k Shortest Paths , 1999, SIAM J. Comput..

[15]  Anja Feldmann,et al.  Panopticon: Reaping the Benefits of Incremental SDN Deployment in Enterprise Networks , 2014, USENIX Annual Technical Conference.

[16]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[17]  Martín Casado,et al.  Network Virtualization in Multi-tenant Datacenters , 2014, NSDI.

[18]  William J. Dally,et al.  Principles and Practices of Interconnection Networks , 2004 .

[19]  Lawrence Kreeger,et al.  Virtual eXtensible Local Area Network (VXLAN): A Framework for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks , 2014, RFC.

[20]  Nick McKeown,et al.  pFabric: minimal near-optimal datacenter transport , 2013, SIGCOMM.

[21]  Ming Zhang,et al.  MicroTE: fine grained traffic engineering for data centers , 2011, CoNEXT '11.

[22]  Wei Bai,et al.  Information-Agnostic Flow Scheduling for Commodity Data Centers , 2015, NSDI.

[23]  Helen J. Wang,et al.  SecondNet: a data center network virtualization architecture with bandwidth guarantees , 2010, CoNEXT.

[24]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[25]  Rob Enns,et al.  NETCONF Configuration Protocol , 2006, RFC.

[26]  Albert G. Greenberg,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM '10.

[27]  Min Zhu,et al.  B4: experience with a globally-deployed software defined wan , 2013, SIGCOMM.

[28]  Justine Sherry,et al.  Silo: Predictable Message Latency in the Cloud , 2015, Comput. Commun. Rev..

[29]  Haitao Wu,et al.  Enabling Work-Conserving Bandwidth Guarantees for Multi-Tenant Datacenters via Dynamic Tenant-Queue Binding , 2017, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[30]  Nick McKeown,et al.  OpenFlow: enabling innovation in campus networks , 2008, CCRV.

[31]  Ankit Singla,et al.  Jellyfish: Networking Data Centers Randomly , 2011, NSDI.

[32]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[33]  Nick McKeown,et al.  Why flow-completion time is the right metric for congestion control , 2006, CCRV.

[34]  Amin Vahdat,et al.  B4 and after: managing hierarchy, partitioning, and asymmetry for availability and scale in google's software-defined WAN , 2018, SIGCOMM.

[35]  Pankaj Garg,et al.  NVGRE: Network Virtualization Using Generic Routing Encapsulation , 2015, RFC.

[36]  George Varghese,et al.  CONGA: distributed congestion-aware load balancing for datacenters , 2015, SIGCOMM.

[37]  Emin Gün Sirer,et al.  Small-world datacenters , 2011, SoCC.

[38]  Hui Lu,et al.  HybNET: network manager for a hybrid network infrastructure , 2013, Middleware Industry '13.

[39]  Guofei Gu,et al.  CloudWatcher: Network security monitoring using OpenFlow in dynamic cloud networks (or: How to provide security monitoring as a service in clouds?) , 2012, 2012 20th IEEE International Conference on Network Protocols (ICNP).