Distributed Bottleneck-Aware Coflow Scheduling in Data Centers

With the booming development of data parallel frameworks, the coflow abstraction has been greatly favored by data center transport designs, for its prominent ability in capturing application-level semantics. To accelerate job completion, coflow completion time (CCT) is a most important metric, and coflow scheduling is the most effective and widely-adopted means of optimizing CCT. However, most existing coflow scheduling mechanisms neglect the ubiquitous in-network bottlenecks and schedule coflows based on non-blocking giant switch hyperthesis. Such a practice is likely to result in undesired link contention inside the fabric, finally impairing CCT performance. To address this problem, we propose the Distributed Bottleneck-Aware coflow scheduling algorithm called DBA, which approximates the minimum remaining time first (MRTF) heuristic on all fabric-wide links. In this way, core link bandwidths are allocated to coflows as expected and the CCT performance will not be violated. As an evolutionary algorithm, DBA enhances the traditional dual decomposition method thus converges to the optimal bandwidth allocation very fast. Extensive simulations verify DBA's outstanding CCT performance as well as high link utilization. Furthermore, DBA introduces very little overhead and is robust to routing strategies, parameter variations and computation delays.

[1]  David A. Maltz,et al.  Data center TCP (DCTCP) , 2010, SIGCOMM 2010.

[2]  Gautam Kumar,et al.  pHost: distributed near-optimal datacenter transport over commodity network fabric , 2015, CoNEXT.

[3]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[4]  Brighten Godfrey,et al.  Finishing flows quickly with preemptive scheduling , 2012, CCRV.

[5]  Ion Stoica,et al.  Coflow: a networking abstraction for cluster applications , 2012, HotNets-XI.

[6]  Nick McKeown,et al.  pFabric: minimal near-optimal datacenter transport , 2013, SIGCOMM.

[7]  Kai Chen,et al.  Stream: Decentralized opportunistic inter-coflow scheduling for datacenter networks , 2016, 2016 IEEE 24th International Conference on Network Protocols (ICNP).

[8]  Srikanth Kandula,et al.  Leveraging endpoint flexibility in data-intensive clusters , 2013, SIGCOMM.

[9]  Daniel Pérez Palomar,et al.  A tutorial on decomposition methods for network utility maximization , 2006, IEEE Journal on Selected Areas in Communications.

[10]  Kang Lee,et al.  IEEE 1588 standard for a precision clock synchronization protocol for networked measurement and control systems , 2002, 2nd ISA/IEEE Sensors for Industry Conference,.

[11]  Devavrat Shah,et al.  Fastpass: a centralized "zero-queue" datacenter network , 2015, SIGCOMM 2015.

[12]  T. S. Eugene Ng,et al.  Sunflow: Efficient Optical Circuit Scheduling for Coflows , 2016, CoNEXT.

[13]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[14]  Bo Li,et al.  Adia: Achieving High Link Utilization with Coflow-Aware Scheduling in Data Center Networks , 2019, IEEE Transactions on Cloud Computing.

[15]  Yanhui Geng,et al.  CODA: Toward Automatically Identifying and Scheduling Coflows in the Dark , 2016, SIGCOMM.

[16]  Sheng Wang,et al.  Rapier: Integrating routing and scheduling for coflow-aware data center networks , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[17]  Michael I. Jordan,et al.  Managing data transfers in computer clusters with orchestra , 2011, SIGCOMM.

[18]  Harrick M. Vin,et al.  Start-time fair queueing: a scheduling algorithm for integrated services packet switching networks , 1996, SIGCOMM 1996.

[19]  George Varghese,et al.  CONGA: distributed congestion-aware load balancing for datacenters , 2015, SIGCOMM.

[20]  Wei Bai,et al.  Information-Agnostic Flow Scheduling for Commodity Data Centers , 2015, NSDI.

[21]  Helen J. Wang,et al.  SecondNet: a data center network virtualization architecture with bandwidth guarantees , 2010, CoNEXT.

[22]  Yiming Zhang,et al.  OPTAS: Decentralized flow monitoring and scheduling for tiny tasks , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[23]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[24]  Yuan Zhong,et al.  Minimizing the Total Weighted Completion Time of Coflows in Datacenter Networks , 2015, SPAA.

[25]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[26]  Amin Vahdat,et al.  Scale-Out Networking in the Data Center , 2010, IEEE Micro.

[27]  David A. Maltz,et al.  Network traffic characteristics of data centers in the wild , 2010, IMC '10.

[28]  Scott Shenker,et al.  Analysis and simulation of a fair queueing algorithm , 1989, SIGCOMM 1989.

[29]  Jipeng Zhou,et al.  Efficient online coflow routing and scheduling , 2016, MobiHoc.

[30]  Antony I. T. Rowstron,et al.  Decentralized task-aware scheduling for data center networks , 2014, SIGCOMM.

[31]  Christian E. Hopps,et al.  Analysis of an Equal-Cost Multi-Path Algorithm , 2000, RFC.

[32]  Sheng Wang,et al.  Towards Practical and Near-Optimal Coflow Scheduling for Data Center Networks , 2016, IEEE Transactions on Parallel and Distributed Systems.

[33]  Ion Stoica,et al.  Efficient coflow scheduling with Varys , 2015, SIGCOMM.

[34]  Ion Stoica,et al.  Efficient Coflow Scheduling Without Prior Knowledge , 2015, SIGCOMM.

[35]  Brighten Godfrey,et al.  DRILL: Micro Load Balancing for Low-latency Data Center Networks , 2017, SIGCOMM.

[36]  Nick McKeown,et al.  Programmable Packet Scheduling at Line Rate , 2016, SIGCOMM.

[37]  Amin Vahdat,et al.  PortLand: a scalable fault-tolerant layer 2 data center network fabric , 2009, SIGCOMM '09.

[38]  Mung Chiang,et al.  Need for speed: CORA scheduler for optimizing completion-times in the cloud , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[39]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[40]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.