Saath: Speeding up CoFlows by Exploiting the Spatial Dimension

CoFlow scheduling improves data-intensive application performance by improving their networking performance. State-of-the-art CoFlow schedulers in essence approximate the classic online Shortest-Job-First (SJF) scheduling, designed for a single CPU, in a distributed setting, with no coordination among how the flows of a CoFlow at individual ports are scheduled, and as a result suffer two performance drawbacks: (1) The flows of a CoFlow may suffer the out-of-sync problem -- they may be scheduled at different times and become drifting apart, negatively affecting the CoFlow completion time (CCT); (2) FIFO scheduling of flows at each port bears no notion of SJF, leading to suboptimal CCT. We propose Saath, an online CoFlow scheduler that overcomes the above drawbacks by explicitly exploiting the spatial dimension of CoFlows. In Saath, the global scheduler schedules the flows of a CoFlow using an all-or-none policy which mitigates the out-of-sync problem. To order the CoFlows within each queue, Saath resorts to a Least-Contention-First (LCoF) policy which we show extends the gist of SJF to the spatial dimension, complemented with starvation freedom. Our evaluation using an Azure testbed and simulations of two production cluster traces show that compared to Aalo, Saath reduces the CCT in median (P90) cases by 1.53x (4.5x) and 1.42x (37x), respectively.

[1]  Ion Stoica,et al.  Efficient Coflow Scheduling Without Prior Knowledge , 2015, SIGCOMM.

[2]  Gautam Kumar,et al.  Hold 'em or fold 'em?: aggregation queries under performance variations , 2016, EuroSys.

[3]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[4]  Albert G. Greenberg,et al.  Reining in the Outliers in Map-Reduce Clusters using Mantri , 2010, OSDI.

[5]  Mor Harchol-Balter,et al.  ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[6]  Nick McKeown,et al.  pFabric: minimal near-optimal datacenter transport , 2013, SIGCOMM.

[7]  Uwe Schwiegelshohn,et al.  Analysis of first-come-first-serve parallel job scheduling , 1998, SODA '98.

[8]  Devavrat Shah,et al.  Fastpass , 2014, SIGCOMM.

[9]  Ion Stoica,et al.  Coflow: a networking abstraction for cluster applications , 2012, HotNets-XI.

[10]  Adam Wierman,et al.  The Foreground-Background queue: A survey , 2008, Perform. Evaluation.

[11]  Edward G. Coffman,et al.  Feedback Queueing Models for Time-Shared Systems , 1968, J. ACM.

[12]  Yanhui Geng,et al.  CODA: Toward Automatically Identifying and Scheduling Coflows in the Dark , 2016, SIGCOMM.

[13]  Antony I. T. Rowstron,et al.  Decentralized task-aware scheduling for data center networks , 2014, SIGCOMM.

[14]  Ion Stoica,et al.  Efficient coflow scheduling with Varys , 2014, SIGCOMM.

[15]  Dror G. Feitelson,et al.  Improved Utilization and Responsiveness with Gang Scheduling , 1997, JSSPP.

[16]  Ramesh Govindan,et al.  Scalable Rule Management for Data Centers , 2013, NSDI.

[17]  Robert C. Daley,et al.  An experimental time-sharing system , 1962, AIEE-IRE '62 (Spring).

[18]  Michael I. Jordan,et al.  Managing data transfers in computer clusters with orchestra , 2011, SIGCOMM.

[19]  Shashank Gupta,et al.  Web-scale entity annotation using MapReduce , 2013, 20th Annual International Conference on High Performance Computing.

[20]  David A. Lifka,et al.  The ANL/IBM SP Scheduling System , 1995, JSSPP.

[21]  Ming Zhang,et al.  MicroTE: fine grained traffic engineering for data centers , 2011, CoNEXT '11.

[22]  Guillaume Urvoy-Keller,et al.  Analysis of LAS scheduling for job size distributions with high variance , 2003, SIGMETRICS '03.

[23]  Yuan Zhong,et al.  Minimizing the Total Weighted Completion Time of Coflows in Datacenter Networks , 2015, SPAA.

[24]  Di Xie,et al.  The only constant is change: incorporating time-varying network reservations in data centers , 2012, CCRV.

[25]  Srikanth Kandula,et al.  PACMan: Coordinated Memory Caching for Parallel Jobs , 2012, NSDI.

[26]  Li Chen,et al.  PIAS: Practical Information-Agnostic Flow Scheduling for Data Center Networks , 2014, HotNets.

[27]  Y. Charlie Hu,et al.  Graviton: Twisting Space and Time to Speed-up CoFlows , 2016, HotCloud.

[28]  Brighten Godfrey,et al.  Finishing flows quickly with preemptive scheduling , 2012, CCRV.

[29]  Christoph Koch,et al.  Squall: Scalable Real-time Analytics , 2016, Proc. VLDB Endow..