Weaver: Efficient Coflow Scheduling in Heterogeneous Parallel Networks

Leveraging application-level requirements expressed in Coflows has been shown to improve application-level communication efficiency. However, most existing works assume all application traffic is serviced by one monolithic network. This over-simplified assumption is no longer sufficient in a modern, evolving data center which operates on multiple generations of network fabrics, an architecture that we define as Heterogeneous Parallel Networks (HPNs). In this paper, we present the first scheduler, called Weaver, that addresses the Coflow management problem in HPNs. To exploit HPNs fully, achieving high communication efficiency for applications is crucial, yet it is also challenging because of the complex traffic patterns in applications and the heterogeneous bandwidth distribution in HPNs. Weaver addresses these challenges at two levels. At the microscopic level, for each application, Weaver leverages an efficient algorithm to exploit the distributed bandwidth in HPNs, which we proved to be within a constant factor of the optimal. At the macroscopic level involving multiple applications, Weaver can adopt a range of application traffic scheduling policies as desired by the system operator. Under realistic traffic, Weaver enables HPNs to service Coflows as efficiently as a monolithic network with equivalent aggregated capacity.

[1]  Nick McKeown,et al.  pFabric: minimal near-optimal datacenter transport , 2013, SIGCOMM.

[2]  David Thaler,et al.  Multipath Issues in Unicast and Multicast Next-Hop Selection , 2000, RFC.

[3]  Ion Stoica,et al.  Efficient Coflow Scheduling Without Prior Knowledge , 2015, SIGCOMM.

[4]  Ellis Horowitz,et al.  Exact and Approximate Algorithms for Scheduling Nonidentical Processors , 1976, JACM.

[5]  Nick McKeown,et al.  OpenFlow: enabling innovation in campus networks , 2008, CCRV.

[6]  Yanhui Geng,et al.  CODA: Toward Automatically Identifying and Scheduling Coflows in the Dark , 2016, SIGCOMM.

[7]  Sheng Wang,et al.  Rapier: Integrating routing and scheduling for coflow-aware data center networks , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[8]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[9]  Ion Stoica,et al.  Efficient coflow scheduling with Varys , 2014, SIGCOMM.

[10]  Aditya Akella,et al.  Altruistic Scheduling in Multi-Resource Clusters , 2016, OSDI.

[11]  Mark Handley,et al.  TCP Extensions for Multipath Operation with Multiple Addresses , 2020, RFC.

[12]  Feng Qian,et al.  An anatomy of mobile web performance over multipath TCP , 2015, CoNEXT.

[13]  Oscar H. Ibarra,et al.  Bounds for LPT Schedules on Uniform Processors , 1977, SIAM J. Comput..

[14]  Edward G. Coffman,et al.  Scheduling independent tasks to reduce mean finishing time , 1974, CACM.

[15]  T. S. Eugene Ng,et al.  Sunflow: Efficient Optical Circuit Scheduling for Coflows , 2016, CoNEXT.

[16]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[17]  Amin Vahdat,et al.  Sincronia: near-optimal network design for coflows , 2018, SIGCOMM.

[18]  Hong Liu,et al.  Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network , 2015, Comput. Commun. Rev..

[19]  Feng Qian,et al.  MP-DASH: Adaptive Video Streaming Over Preference-Aware Multipath , 2016, CoNEXT.

[20]  Thomas E. Anderson,et al.  Subways: a case for redundant, inexpensive data center edge links , 2015, CoNEXT.

[21]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.