A Distributed Algorithm to Calculate Max-Min Fair Rates Without Per-Flow State

Most congestion control algorithms, like TCP, rely on a reactive control system that detects congestion, then marches carefully towards a desired operating point (e.g. by modifying the window size or adjusting a rate). In an effort to balance stability and convergence speed, they often take hundreds of RTTs to converge; an increasing problem as networks get faster, with less time to react. This paper is about an alternative class of congestion control algorithms based on proactive-scheduling: switches and NICs "pro-actively" exchange control messages to run a \em distributed algorithm to pick "explicit rates for each flow. We call these Proactive Explicit Rate Control (PERC) algorithms. They take as input the routing matrix and link speeds, but not a congestion signal. By exploiting information such as the number of flows at a link, they can converge an order of magnitude faster than reactive algorithms. Our main contributions are (1) s-PERC ("stateless" PERC), a new practical distributed PERC algorithm without per-flow state at the switches, and (2) a proof that s-PERC computes exact max-min fair rates in a known bounded time, the first such algorithm to do so without per-flow state. To analyze s-PERC, we introduce a parallel variant of standard waterfilling, 2-Waterfilling. We prove that s-PERC converges to max-min fair in 6N rounds, where N is the number of iterations 2-Waterfilling takes for the same routing matrix. We describe how to make s-PERC practical and robust to deploy in real networks. We confirm using realistic simulations and an FPGA hardware testbed that s-PERC converges 10-100x faster than reactive algorithms like TCP, DCTCP and RCP in data-center networks and 1.3--6x faster in wide-area networks (WANs). Long flows complete in close to the ideal time, while short-lived flows are prioritized, making it appropriate for data-centers and WANs.

[1]  Dongsu Han,et al.  Credit-Scheduled Delay-Bounded Congestion Control for Datacenters , 2017, SIGCOMM.

[2]  Mark Handley,et al.  Congestion control for high bandwidth-delay product networks , 2002, SIGCOMM '02.

[3]  Albert G. Greenberg,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM '10.

[4]  Andrew W. Moore,et al.  NetFPGA SUME: Toward 100 Gbps as Research Commodity , 2014, IEEE Micro.

[5]  Jeannine Mosely,et al.  Asynchronous distributed flow control algorithms , 1984 .

[6]  Jeffrey M. Jaffe,et al.  Bottleneck Flow Control , 1981, IEEE Trans. Commun..

[7]  Wei Bai,et al.  Information-Agnostic Flow Scheduling for Commodity Data Centers , 2015, NSDI.

[8]  Guy Leduc,et al.  A Distributed Algorithm for Weighted Max-Min Fairness in MPLS Networks , 2004, ICT.

[9]  Amin Vahdat,et al.  TIMELY: RTT-based Congestion Control for the Datacenter , 2015, Comput. Commun. Rev..

[10]  Alex X. Liu,et al.  Friends, not Foes – Synthesizing Existing Transport Strategies for Data Center Networks , 2014 .

[11]  Thomas Voice,et al.  Stability Analysis of a Max-Min Fair Rate Control Protocol (RCP) in a Small Buffer Regime , 2007, IEEE Transactions on Automatic Control.

[12]  Tae-Jin Lee,et al.  A decentralized framework to achieve max-min fair bandwidth allocation for ATM networks , 1998, IEEE GLOBECOM 1998 (Cat. NO. 98CH36250).

[13]  Moti Yung,et al.  Approximating max-min fair rates via distributed local scheduling with partial information , 1996, Proceedings of IEEE INFOCOM '96. Conference on Computer Communications.

[14]  Lisa Zhang,et al.  Fast, Fair and Frugal Bandwidth Allocation in ATM Networks , 1999, SODA '99.

[15]  George Varghese,et al.  High Speed Networks Need Proactive Congestion Control , 2015, HotNets.

[16]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[17]  Alberto Mozo,et al.  SLBN: A Scalable Max-min Fair Algorithm for Rate-Based Explicit Congestion Control , 2012, 2012 IEEE 11th International Symposium on Network Computing and Applications.

[18]  Raj Jain,et al.  The ERICA switch algorithm for ABR traffic management in ATM networks , 1998, TNET.

[19]  Dimitri P. Bertsekas,et al.  Data Networks , 1986 .

[20]  Amin Vahdat,et al.  BwE: Flexible, Hierarchical Bandwidth Allocation for WAN Distributed Computing , 2015, Comput. Commun. Rev..

[21]  Devavrat Shah,et al.  Flowtune: Flowlet Control for Datacenter Networks , 2017, NSDI.

[22]  Daniel Raumer,et al.  MoonGen: A Scriptable High-Speed Packet Generator , 2014, Internet Measurement Conference.

[23]  Eli Gafni,et al.  Dynamic Control of Session Input Rates in Communication Networks , 1982, MILCOM 1982 - IEEE Military Communications Conference - Progress in Spread Spectrum Communications.

[24]  Raj Jain,et al.  Congestion control with explicit rate indication , 1995, Proceedings IEEE International Conference on Communications ICC '95.

[25]  K. K. Ramakrishnan,et al.  Time scale analysis scalability issues for explicit rate allocation in ATM networks , 1996, TNET.

[26]  Gil Zussman,et al.  A Fast Distributed Stateless Algorithm for alpha-Fair Packing Problems , 2015, ICALP.

[27]  Howard Paul Hayden,et al.  Voice flow control in integrated packet networks , 1981 .

[28]  Mark Handley,et al.  Re-architecting datacenter networks and stacks for low latency and high performance , 2017, SIGCOMM.

[29]  Yishay Mansour,et al.  Convergence Complexity of Optimistic Rate-Based Flow-Control Algorithms , 1999, J. Algorithms.

[30]  Yuseok Kim,et al.  Minimum rate guarantee without per-flow information , 1999, Proceedings. Seventh International Conference on Network Protocols.

[31]  Baruch Awerbuch,et al.  Converging to approximated max-min flow fairness in logarithmic time , 1998, Proceedings. IEEE INFOCOM '98, the Conference on Computer Communications. Seventeenth Annual Joint Conference of the IEEE Computer and Communications Societies. Gateway to the 21st Century (Cat. No.98.

[32]  Antony I. T. Rowstron,et al.  Better never than late: meeting deadlines in datacenter networks , 2011, SIGCOMM.

[33]  Mohamed G. Gouda,et al.  Stabilization of max-min fair networks without per-flow state , 2008, Theor. Comput. Sci..

[34]  Yiwei Thomas Hou,et al.  On generalized max-min rate allocation and distributed convergence algorithm for packet networks , 2004, IEEE Transactions on Parallel and Distributed Systems.

[35]  Brighten Godfrey,et al.  Finishing flows quickly with preemptive scheduling , 2012, CCRV.

[36]  Jean-Yves Le Boudec,et al.  Rate adaptation, Congestion Control and Fairness: A Tutorial , 2000 .

[37]  Yishay Mansour,et al.  Phantom: a simple and effective flow control scheme , 1996, SIGCOMM '96.

[38]  Guy Kortsarz,et al.  Sum Multicoloring of Graphs , 2000, J. Algorithms.

[39]  Manolis Katevenis,et al.  Accurate Congestion Control for RDMA Transfers , 2018, 2018 Twelfth IEEE/ACM International Symposium on Networks-on-Chip (NOCS).

[40]  Nick McKeown,et al.  pFabric: minimal near-optimal datacenter transport , 2013, SIGCOMM.

[41]  Abhay Parekh,et al.  A generalized processor sharing approach to flow control in integrated services networks-the single node case , 1992, [Proceedings] IEEE INFOCOM '92: The Conference on Computer Communications.

[42]  Scott Shenker,et al.  Analysis and simulation of a fair queueing algorithm , 1989, SIGCOMM '89.

[43]  Jacob Nelson,et al.  Evaluating the Power of Flexible Packet Processing for Network Resource Allocation , 2017, NSDI.

[44]  Baruch Awerbuch,et al.  Greedy distributed optimization of multi-commodity flows , 2007, PODC '07.

[45]  Alberto Mozo,et al.  A distributed and quiescent max-min fair algorithm for network congestion control , 2018, Expert Syst. Appl..

[46]  Wei Kang Tsai,et al.  A Lexicographic Optimization Framework to the Flow Control Problem , 2010, IEEE Transactions on Information Theory.

[47]  Wei Kang Tsai,et al.  A theory of convergence order of maxmin rate allocation and an optimal protocol , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).

[48]  Nick McKeown,et al.  Rate control protocol (rcp): congestion control to make flows complete quickly , 2008 .