Re-architecting datacenter networks and stacks for low latency and high performance

Modern datacenter networks provide very high capacity via redundant Clos topologies and low switch latency, but transport protocols rarely deliver matching performance. We present NDP, a novel data-center transport architecture that achieves near-optimal completion times for short transfers and high flow throughput in a wide range of scenarios, including incast. NDP switch buffers are very shallow and when they fill the switches trim packets to headers and priority forward the headers. This gives receivers a full view of instantaneous demand from all senders, and is the basis for our novel, high-performance, multipath-aware transport protocol that can deal gracefully with massive incast events and prioritize traffic from different senders on RTT timescales. We implemented NDP in Linux hosts with DPDK, in a software switch, in a NetFPGA-based hardware switch, and in P4. We evaluate NDP's performance in our implementations and in large-scale simulations, simultaneously demonstrating support for very low-latency and high throughput.

[1]  Chuang Lin,et al.  Catch the Whole Lot in an Action: Rapid Precise Packet Loss Notification in Data Center , 2014, NSDI.

[2]  Haitao Wu,et al.  BCube: a high performance, server-centric network architecture for modular data centers , 2009, SIGCOMM '09.

[3]  Sally Floyd,et al.  Dynamics of TCP traffic over ATM networks , 1994 .

[4]  Matthew Mathis,et al.  RFC6928 - Increasing TCP's Initial Window , 2013 .

[5]  Haitao Wu,et al.  RDMA over Commodity Ethernet at Scale , 2016, SIGCOMM.

[6]  Alex C. Snoeren,et al.  Inside the Social Network's (Datacenter) Network , 2015, Comput. Commun. Rev..

[7]  Nandita Dukkipati,et al.  Increasing TCP's Initial Window , 2013, RFC.

[8]  Junda Liu,et al.  Multi-enterprise networking , 2000 .

[9]  Ming Zhang,et al.  Congestion Control for Large-Scale RDMA Deployments , 2015, Comput. Commun. Rev..

[10]  Ramana Rao Kompella,et al.  On the impact of packet spraying in data center networks , 2013, 2013 Proceedings IEEE INFOCOM.

[11]  Sally Floyd,et al.  IAB Concerns Regarding Congestion Control for Voice Traffic in the Internet , 2004, RFC.

[12]  Yuchung Cheng,et al.  RFC 7413 - TCP Fast Open , 2014 .

[13]  Amin Vahdat,et al.  Less Is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center , 2012, NSDI.

[14]  Antony I. T. Rowstron,et al.  Better never than late: meeting deadlines in datacenter networks , 2011, SIGCOMM.

[15]  Brighten Godfrey,et al.  Finishing flows quickly with preemptive scheduling , 2012, CCRV.

[16]  Amin Vahdat,et al.  TIMELY: RTT-based Congestion Control for the Datacenter , 2015, Comput. Commun. Rev..

[17]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[18]  T. N. Vijaykumar,et al.  Deadline-aware datacenter tcp (D2TCP) , 2012, SIGCOMM '12.

[19]  Mark Handley,et al.  Improving datacenter performance and robustness with multipath TCP , 2011, SIGCOMM.

[20]  Ankit Singla,et al.  Jellyfish: Networking Data Centers Randomly , 2011, NSDI.

[21]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[22]  Amar Phanishayee,et al.  Safe and effective fine-grained TCP retransmissions for datacenter communication , 2009, SIGCOMM '09.

[23]  Nick McKeown,et al.  pFabric: minimal near-optimal datacenter transport , 2013, SIGCOMM.

[24]  Van Jacobson,et al.  Traffic phase effects in packet-switched gateways , 1991, CCRV.

[25]  Van Jacobson,et al.  Congestion avoidance and control , 1988, SIGCOMM '88.

[26]  Robert Braden,et al.  T/TCP - TCP Extensions for Transactions Functional Specification , 1994, RFC.

[27]  Gautam Kumar,et al.  pHost: distributed near-optimal datacenter transport over commodity network fabric , 2015, CoNEXT.

[28]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[29]  Albert G. Greenberg,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM '10.

[30]  Andrew W. Moore,et al.  NetFPGA SUME: Toward 100 Gbps as Research Commodity , 2014, IEEE Micro.

[31]  David A. Maltz,et al.  Network traffic characteristics of data centers in the wild , 2010, IMC '10.

[32]  Michael J. Freedman,et al.  Scalable, optimal flow routing in datacenters via local link balancing , 2013, CoNEXT.

[33]  David L. Black,et al.  The Addition of Explicit Congestion Notification (ECN) to IP , 2001, RFC.

[34]  George Varghese,et al.  CONGA: distributed congestion-aware load balancing for datacenters , 2015, SIGCOMM.

[35]  Keqiang He,et al.  Presto: Edge-based Load Balancing for Fast Datacenter Networks , 2015, SIGCOMM.

[36]  Devavrat Shah,et al.  Fastpass , 2014, SIGCOMM.