Coflow scheduling in input-queued switches: Optimal delay scaling and algorithms

A coflow is a collection of parallel flows belonging to the same job. It has the all-or-nothing property: a coflow is not complete until the completion of all its constituent flows. In this paper, we focus on optimizing coflow-level delay, i.e., the time to complete all the flows in a coflow, in the context of an N × N input-queued switch. In particular, we develop a throughput-optimal scheduling policy that achieves the best scaling of coflow-level delay as N → ∞. We first derive lower bounds on the coflow-level delay that can be achieved by any scheduling policy. It is observed that these lower bounds critically depend on the variability of flow sizes. Then we analyze the coflow-level performance of some existing coflow-agnostic scheduling policies and show that none of them achieves provably optimal performance with respect to coflow-level delay. Finally, we propose the Coflow-Aware Batching (CAB) policy which achieves the optimal scaling of coflow-level delay under some mild assumptions.

[1]  T. Inukai,et al.  An Efficient SS/TDMA Time Slot Assignment Algorithm , 1979, IEEE Trans. Commun..

[2]  Devavrat Shah,et al.  Optimal queue-size scaling in switched networks , 2011, SIGMETRICS '12.

[3]  N. Papadatos Maximum variance of order statistics , 1995 .

[4]  Herwig Bruneel,et al.  Discrete-time multiserver queues with geometric service times , 2004, Comput. Oper. Res..

[5]  Jean C. Walrand,et al.  Achieving 100% throughput in an input-queued switch , 1996, Proceedings of IEEE INFOCOM '96. Conference on Computer Communications.

[6]  Ion Stoica,et al.  Efficient Coflow Scheduling Without Prior Knowledge , 2015, SIGCOMM.

[7]  R. Gallager Stochastic Processes , 2014 .

[8]  Luc Devroye,et al.  Inequalities for the Completion Times of Stochastic PERT Networks , 1979, Math. Oper. Res..

[9]  Cheng-Shang Chang,et al.  Birkhoff-von Neumann input buffered crossbar switches , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[10]  T. Meisling Discrete-Time Queuing Theory , 1958 .

[11]  Michael I. Jordan,et al.  Managing data transfers in computer clusters with orchestra , 2011, SIGCOMM.

[12]  H. Robbins,et al.  A class of dependent random variables and their maxima , 1978 .

[13]  Eytan Modiano,et al.  Logarithmic delay for N × N packet switches under the crossbar constraint , 2007, TNET.

[14]  F. Baccelli,et al.  The fork-join queue and related systems with synchronization constraints: stochastic ordering and computable bounds , 1989, Advances in Applied Probability.

[15]  D. Walkup,et al.  Association of Random Variables, with Applications , 1967 .

[16]  Sheng Wang,et al.  Rapier: Integrating routing and scheduling for coflow-aware data center networks , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[17]  Minlan Yu,et al.  Scheduling jobs across geo-distributed datacenters , 2015, SoCC.

[18]  Devavrat Shah,et al.  Delay bounds for approximate maximum weight matching algorithms for input queued switches , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[19]  Guillaume Urvoy-Keller,et al.  Analysis of LAS scheduling for job size distributions with high variance , 2003, SIGMETRICS '03.

[20]  Sheng Wang,et al.  Minimizing average coflow completion time with decentralized scheduling , 2015, 2015 IEEE International Conference on Communications (ICC).

[21]  R. Srikant,et al.  Optimal heavy-traffic queue length scaling in an incompletely saturated switch , 2016, Queueing Systems.

[22]  Yuan Zhong,et al.  Minimizing the Total Weighted Completion Time of Coflows in Datacenter Networks , 2015, SPAA.

[23]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[24]  B. Eisenberg On the expectation of the maximum of IID geometric random variables , 2008 .

[25]  Lei Ying,et al.  Communication Networks - An Optimization, Control, and Stochastic Networks Perspective , 2014 .

[26]  R. Srikant,et al.  Heavy-Traffic Behavior of the MaxWeight Algorithm in a Switch with Uniform Traffic , 2015, PERV.

[27]  Antony I. T. Rowstron,et al.  Decentralized task-aware scheduling for data center networks , 2014, SIGCOMM.

[28]  Ion Stoica,et al.  Coflow: a networking abstraction for cluster applications , 2012, HotNets-XI.

[29]  Ion Stoica,et al.  Efficient coflow scheduling with Varys , 2014, SIGCOMM.

[30]  Asser N. Tantawi,et al.  Approximate Analysis of Fork/Join Synchronization in Parallel Queues , 1988, IEEE Trans. Computers.

[31]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[32]  Marco Ajmone Marsan,et al.  Bounds on average delays and queue size averages and variances in input-queued cell-based switches , 2001, Proceedings IEEE INFOCOM 2001. Conference on Computer Communications. Twentieth Annual Joint Conference of the IEEE Computer and Communications Society (Cat. No.01CH37213).