MXDAG: A Hybrid Abstraction for Emerging Applications

Emerging distributed applications, such as microservices, machine learning, big data analysis, consist of both compute and network tasks. DAG-based abstraction primarily targets compute tasks and has no explicit network-level scheduling. In contrast, Coflow abstraction collectively schedules network flows among compute tasks but lacks the end-to-end view of the application DAG. Because of the dependencies and interactions between these two types of tasks, it is sub-optimal to only consider one of them. We argue that co-scheduling of both compute and network tasks can help applications towards the globally optimal end-to-end performance. However, none of the existing abstractions can provide fine-grained information for co-scheduling. We propose MXDAG, an abstraction to treat both compute and network tasks explicitly. It can capture the dependencies and interactions of both compute and network tasks leading to improved application performance.

[1]  P. López,et al.  Triggerflow: trigger-based orchestration of serverless workflows , 2020, DEBS.

[2]  Mohamed Faten Zhani,et al.  PRISM: Fine-Grained Resource-Aware Scheduling for MapReduce , 2015, IEEE Transactions on Cloud Computing.

[3]  Ion Stoica,et al.  Efficient coflow scheduling with Varys , 2014, SIGCOMM.

[4]  Amar Phanishayee,et al.  Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads , 2020, OSDI.

[5]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[6]  Panruo Wu,et al.  Wukong: a scalable and locality-enhanced framework for serverless parallel computing , 2020, SoCC.

[7]  Yibo Zhu,et al.  A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters , 2020, OSDI.

[8]  Saurabh Gupta,et al.  A Multi-faceted Approach to Job Placement for Improved Performance on Extreme-Scale Systems , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Chuan Wu,et al.  Optimus: an efficient dynamic resource scheduler for deep learning clusters , 2018, EuroSys.

[10]  Amar Phanishayee,et al.  Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training , 2018, SoCC.

[11]  Alexandru Iosup,et al.  Trace-based evaluation of job runtime and queue wait time predictions in grids , 2009, HPDC '09.

[12]  Ola Svensson,et al.  (Acyclic) Job Shops are Hard to Approximate , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[13]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[14]  Ishai Menache,et al.  Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can , 2015, SIGCOMM.

[15]  Fabien Hermenier,et al.  Multi-objective job placement in clusters , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[16]  Michael I. Jordan,et al.  Managing data transfers in computer clusters with orchestra , 2011, SIGCOMM.

[17]  Frédéric Giroire,et al.  When Network Matters: Data Center Scheduling with Network Tasks , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[18]  Ion Stoica,et al.  Efficient Coflow Scheduling Without Prior Knowledge , 2015, SIGCOMM.

[19]  Hongzi Mao,et al.  Learning scheduling algorithms for data processing clusters , 2018, SIGCOMM.

[20]  Jaesik Choi,et al.  HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism , 2020, USENIX ATC.

[21]  Antony I. T. Rowstron,et al.  Decentralized task-aware scheduling for data center networks , 2014, SIGCOMM.

[22]  Ziyang Li,et al.  Branch Scheduling: DAG-Aware Scheduling for Speeding up Data-Parallel Jobs , 2019, 2019 IEEE/ACM 27th International Symposium on Quality of Service (IWQoS).

[23]  Srikanth Kandula,et al.  Leveraging endpoint flexibility in data-intensive clusters , 2013, SIGCOMM.

[24]  Ion Stoica,et al.  Caerus: NIMBLE Task Scheduling for Serverless Analytics , 2021, NSDI.

[25]  Dhabaleswar K. Panda,et al.  A Comprehensive Study of MapReduce Over Lustre for Intermediate Data Placement and Shuffle Strategies on HPC Clusters , 2017, IEEE Transactions on Parallel and Distributed Systems.

[26]  Kang G. Shin,et al.  Tiresias: A GPU Cluster Manager for Distributed Deep Learning , 2019, NSDI.

[27]  Carlo Curino,et al.  Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications , 2015, SIGMOD Conference.

[28]  Abhishek Verma,et al.  Large-scale cluster management at Google with Borg , 2015, EuroSys.

[29]  Srikanth Kandula,et al.  This Paper Is Included in the Proceedings of the 12th Usenix Symposium on Operating Systems Design and Implementation (osdi '16). Graphene: Packing and Dependency-aware Scheduling for Data-parallel Clusters G: Packing and Dependency-aware Scheduling for Data-parallel Clusters , 2022 .

[30]  Robert N. M. Watson,et al.  Firmament: Fast, Centralized Cluster Scheduling at Scale , 2016, OSDI.

[31]  Zhen Zhang,et al.  Is Network the Bottleneck of Distributed Training? , 2020, NetAI@SIGCOMM.

[32]  Christian Scheideler,et al.  Improved Bounds for Acyclic Job Shop Scheduling , 2002, Comb..

[33]  Seif Haridi,et al.  Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..

[34]  George Bosilca,et al.  Hierarchical DAG Scheduling for Hybrid Distributed Systems , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[35]  Srikanth Kandula,et al.  Multi-resource packing for cluster schedulers , 2014, SIGCOMM.

[36]  Alexander Sergeev,et al.  Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.

[37]  Ion Stoica,et al.  Coflow: a networking abstraction for cluster applications , 2012, HotNets-XI.

[38]  Dong H. Ahn,et al.  PRIONN: Predicting Runtime and IO using Neural Networks , 2018, ICPP.

[39]  Jipeng Zhou,et al.  Efficient online coflow routing and scheduling , 2016, MobiHoc.

[40]  Wenguang Chen,et al.  Spread-n-share: improving application performance and cluster throughput with resource-aware job placement , 2019, SC.

[41]  Yibo Zhu,et al.  A generic communication scheduler for distributed DNN training acceleration , 2019, SOSP.

[42]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[43]  David A. Maltz,et al.  Surviving failures in bandwidth-constrained datacenters , 2012, CCRV.

[44]  Aditya Akella,et al.  Altruistic Scheduling in Multi-Resource Clusters , 2016, OSDI.