Enabling ECN over Generic Packet Scheduling

Explicit Congestion Notification (ECN) is crucial for production datacenters, but current queue-length based ECN/RED implementation does not work with generic packet schedulers, leading to either degraded network performance or violated scheduling policies. In this paper, we first dive into this issue and reveal that the invalidity of ECN/RED lies in the difficulty of measuring changing queue capacities under various schedulers and traffic dynamics. Then we present Time-based Congestion Notification (TCN), a simple yet effective ECN solution, by combining two successful ideas: the sojourn time from CoDel and the instantaneous marking from DCTCP. Using packet sojourn-time, as opposed to queue-length, as the congestion signal, TCN eliminates the need of measuring dynamic queue capacities, making it suitable for arbitrary schedulers with traffic dynamics. By performing stateless instantaneous ECN marking rather than complex stateful dropping, TCN is designed to be inexpensive to implement on commodity switching chips. Through extensive testbed experiments and large-scale simulations, we show TCN can strictly preserve scheduling policies while providing desirable network performance. For example, TCN significantly reduces the average and 99th percentile completion times for small flows by up to 82.8% and 95.3% compared to current practice in a testbed experiment with production workload.

[1]  Yasir Saleem,et al.  Network Simulator NS-2 , 2015 .

[2]  Haitao Wu,et al.  Tuning ECN for data center networks , 2012, CoNEXT '12.

[3]  Randy H. Katz,et al.  FastLane: making short flows shorter with agile drop notification , 2015, SoCC.

[4]  Alvin Cheung,et al.  Packet Transactions: High-Level Programming for Line-Rate Switches , 2015, SIGCOMM.

[5]  Nick McKeown,et al.  Programmable Packet Scheduling , 2016, ArXiv.

[6]  T. N. Vijaykumar,et al.  Deadline-aware datacenter tcp (D2TCP) , 2012, SIGCOMM '12.

[7]  Amar Phanishayee,et al.  Safe and effective fine-grained TCP retransmissions for datacenter communication , 2009, SIGCOMM '09.

[8]  Haitao Wu,et al.  ICTCP: Incast Congestion Control for TCP in Data-Center Networks , 2010, IEEE/ACM Transactions on Networking.

[9]  Adel Javanmard,et al.  Analysis of DCTCP: stability, convergence, and fairness , 2011, PERV.

[10]  Larry L. Peterson,et al.  TCP Vegas: End to End Congestion Avoidance on a Global Internet , 1995, IEEE J. Sel. Areas Commun..

[11]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[12]  Hong Liu,et al.  Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network , 2015, Comput. Commun. Rev..

[13]  Ming Zhang,et al.  Congestion Control for Large-Scale RDMA Deployments , 2015, Comput. Commun. Rev..

[14]  George Varghese,et al.  P4: programming protocol-independent packet processors , 2013, CCRV.

[15]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[16]  Ali Munir,et al.  Minimizing flow completion times in data centers , 2013, 2013 Proceedings IEEE INFOCOM.

[17]  Glenn Judd,et al.  Attaining the Promise and Avoiding the Pitfalls of TCP in the Datacenter , 2015, NSDI.

[18]  Scott Shenker,et al.  Universal Packet Scheduling , 2015, NSDI.

[19]  Haitao Wu,et al.  Enabling ECN in Multi-Service Multi-Queue Data Centers , 2016, NSDI.

[20]  Qian Zhang,et al.  A Compound TCP Approach for High-Speed and Long Distance Networks , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[21]  Albert G. Greenberg,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM '10.

[22]  Amin Vahdat,et al.  Less Is More: Trading a Little Bandwidth for Ultra-Low Latency in the Data Center , 2012, NSDI.

[23]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[24]  QUTdN QeO,et al.  Random early detection gateways for congestion avoidance , 1993, TNET.

[25]  Amin Vahdat,et al.  TIMELY: RTT-based Congestion Control for the Datacenter , 2015, Comput. Commun. Rev..

[26]  Wei Bai,et al.  Information-Agnostic Flow Scheduling for Commodity Data Centers , 2015, NSDI.

[27]  George Varghese,et al.  Forwarding metamorphosis: fast programmable match-action processing in hardware for SDN , 2013, SIGCOMM.

[28]  Van Jacobson,et al.  Controlling queue delay , 2012, Commun. ACM.

[29]  Nick McKeown,et al.  Programmable Packet Scheduling at Line Rate , 2016, SIGCOMM.

[30]  David L. Black,et al.  The Addition of Explicit Congestion Notification (ECN) to IP , 2001, RFC.

[31]  Vijay Subramanian,et al.  PIE: A lightweight control scheme to address the bufferbloat problem , 2013, 2013 IEEE 14th International Conference on High Performance Switching and Routing (HPSR).

[32]  Alex C. Snoeren,et al.  Inside the Social Network's (Datacenter) Network , 2015, Comput. Commun. Rev..

[33]  Hua Chen,et al.  Pingmesh: A Large-Scale System for Data Center Network Latency Measurement and Analysis , 2015, SIGCOMM.

[34]  Nick McKeown,et al.  pFabric: minimal near-optimal datacenter transport , 2013, SIGCOMM.