EyeQ: Practical Network Performance Isolation at the Edge

The datacenter network is shared among untrusted tenants in a public cloud, and hundreds of services in a private cloud. Today we lack fine-grained control over network bandwidth partitioning across tenants. In this paper we present EyeQ, a simple and practical system that provides tenants with bandwidth guarantees as if their endpoints were connected to a dedicated switch. To realize this goal, EyeQ leverages the high bisection bandwidth in a datacenter fabric and enforces admission control on traffic, regardless of the tenant transport protocol. We show that this pushes bandwidth contention to the network's edge, enabling EyeQ to support end-to-end minimum bandwidth guarantees to tenant end-points in a simple and scalable manner at the servers. EyeQ requires no changes to applications and is deployable with support from the network available today. We evaluate EyeQ with an efficient software implementation at 10Gb/s speeds using unmodified applications and adversarial traffic patterns. Our evaluation demonstrates EyeQ's promise of predictable network performance isolation. For instance, even with an adversarial tenant with bursty UDP traffic, EyeQ is able to maintain the 99.9th percentile latency for a collocated memcached application close to that of a dedicated deployment.

[1]  Scott Shenker,et al.  Analysis and simulation of a fair queueing algorithm , 1989, SIGCOMM '89.

[2]  Abhay Parekh,et al.  A generalized processor sharing approach to flow control in integrated services networks-the single node case , 1992, [Proceedings] IEEE INFOCOM '92: The Conference on Computer Communications.

[3]  Abhay Parekh,et al.  A generalized processor sharing approach to flow control in integrated services networks: the single-node case , 1993, TNET.

[4]  D. Estrin,et al.  RSVP: a new resource reservation protocol , 1993, IEEE Communications Magazine.

[5]  George Varghese,et al.  Efficient fair queueing using deficit round robin , 1995, SIGCOMM '95.

[6]  Hui Zhang,et al.  WF/sup 2/Q: worst-case fair weighted fair queueing , 1996, Proceedings of IEEE INFOCOM '96. Conference on Computer Communications.

[7]  Ion Stoica,et al.  A hierarchical fair service curve algorithm for link-sharing, real-time and priority services , 1997, SIGCOMM '97.

[8]  Scott Shenker,et al.  Core-stateless fair queueing: achieving approximately fair bandwidth allocations in high speed networks , 1998, SIGCOMM '98.

[9]  ZhangHui,et al.  Core-stateless fair queueing , 1998 .

[10]  Albert G. Greenberg,et al.  A flexible model for resource management in virtual private networks , 1999, SIGCOMM '99.

[11]  Christian E. Hopps,et al.  Analysis of an Equal-Cost Multi-Path Algorithm , 2000, RFC.

[12]  A. Kuzmanovic,et al.  Low-rate TCP-targeted denial of service attacks: the shrew vs. the mice and elephants , 2003, SIGCOMM '03.

[13]  Scott Shenker,et al.  Approximate fairness through differential dropping , 2003, CCRV.

[14]  Randy H. Katz,et al.  OverQoS: An Overlay Based Architecture for Enhancing Internet QoS , 2004, NSDI.

[15]  Nick McKeown,et al.  Why flow-completion time is the right metric for congestion control , 2006, CCRV.

[16]  Desmond P. Taylor,et al.  A Generalized Processor Sharing Approach to Flow Control in Integrated Services Networks: The SingleNode Case , 2007 .

[17]  Rong Pan,et al.  Data center transport mechanisms: Congestion control theory and IEEE standardization , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[18]  Lei Shi,et al.  Dcell: a scalable and fault-tolerant network structure for data centers , 2008, SIGCOMM '08.

[19]  Thomas Voice,et al.  Stability and fairness of explicit congestion control with small buffers , 2008, CCRV.

[20]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[21]  Albert G. Greenberg,et al.  The nature of data center traffic: measurements & analysis , 2009, IMC '09.

[22]  Amin Vahdat,et al.  PortLand: a scalable fault-tolerant layer 2 data center network fabric , 2009, SIGCOMM '09.

[23]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[24]  Helen J. Wang,et al.  SecondNet: a data center network virtualization architecture with bandwidth guarantees , 2010, CoNEXT.

[25]  Albert G. Greenberg,et al.  Reining in the Outliers in Map-Reduce Clusters using Mantri , 2010, OSDI.

[26]  Rong Pan,et al.  AF-QCN: Approximate Fairness with Quantized Congestion Notification for Multi-tenanted Data Centers , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.

[27]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[28]  Albert G. Greenberg,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM '10.

[29]  Ming Zhang,et al.  Understanding data center traffic characteristics , 2010, CCRV.

[30]  On the efficacy of fine-grained traffic splitting protocolsin data center networks , 2011, SIGCOMM.

[31]  Jeffrey C. Mogul,et al.  NetLord: a scalable multi-tenant network architecture for virtualized datacenters , 2011, SIGCOMM.

[32]  B. Atikoglu,et al.  Stability analysis of QCN: the averaging principle , 2011, SIGMETRICS '11.

[33]  Gautam Kumar,et al.  FairCloud: sharing the network in cloud computing , 2011, CCRV.

[34]  Anees Shaikh,et al.  CloudNaaS: a cloud networking platform for enterprise applications , 2011, SoCC.

[35]  Mark Handley,et al.  Improving datacenter performance and robustness with multipath TCP , 2011, SIGCOMM.

[36]  Albert G. Greenberg,et al.  Sharing the Data Center Network , 2011, NSDI.

[37]  Haitao Wu,et al.  ServerSwitch: A Programmable and High Performance Platform for Data Center Networks , 2011, NSDI.

[38]  Towards predictable datacenter networks , 2011, SIGCOMM.

[39]  Dorgival O. Guedes,et al.  Gatekeeper: Supporting Bandwidth Guarantees for Multi-tenant Datacenter Networks , 2011, WIOV.

[40]  Lucian Popa,et al.  What we talk about when we talk about cloud network performance , 2012, CCRV.

[41]  D. Zats,et al.  DeTail: reducing the flow completion time tail in datacenter networks , 2012, CCRV.

[42]  Amin Vahdat,et al.  Less Is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center , 2012, NSDI.

[43]  David Mazières,et al.  EyeQ: Practical Network Performance Isolation for the Multi-tenant Cloud , 2012, HotCloud.

[44]  George Varghese,et al.  Netshare and stochastic netshare: predictable bandwidth allocation for data centers , 2012, CCRV.

[45]  Ramana Rao Kompella,et al.  On the efficacy of fine-grained traffic splitting protocols in data center networks , 2012, SIGMETRICS '12.