MXDAG: A Hybrid Abstraction for Emerging Applications
暂无分享,去创建一个
T. S. Eugene Ng | Xinyu Crystal Wu | Zhuang Wang | Weitao Wang | Sushovan Das | Ang Chen | T. Ng | Ang Chen | Sushovan Das | Weitao Wang | X. Wu | Zhuang Wang
[1] P. López,et al. Triggerflow: trigger-based orchestration of serverless workflows , 2020, DEBS.
[2] Mohamed Faten Zhani,et al. PRISM: Fine-Grained Resource-Aware Scheduling for MapReduce , 2015, IEEE Transactions on Cloud Computing.
[3] Ion Stoica,et al. Efficient coflow scheduling with Varys , 2014, SIGCOMM.
[4] Amar Phanishayee,et al. Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads , 2020, OSDI.
[5] Carlo Curino,et al. Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.
[6] Panruo Wu,et al. Wukong: a scalable and locality-enhanced framework for serverless parallel computing , 2020, SoCC.
[7] Yibo Zhu,et al. A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters , 2020, OSDI.
[8] Saurabh Gupta,et al. A Multi-faceted Approach to Job Placement for Improved Performance on Extreme-Scale Systems , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[9] Chuan Wu,et al. Optimus: an efficient dynamic resource scheduler for deep learning clusters , 2018, EuroSys.
[10] Amar Phanishayee,et al. Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training , 2018, SoCC.
[11] Alexandru Iosup,et al. Trace-based evaluation of job runtime and queue wait time predictions in grids , 2009, HPDC '09.
[12] Ola Svensson,et al. (Acyclic) Job Shops are Hard to Approximate , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.
[13] Yuan Yu,et al. Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.
[14] Ishai Menache,et al. Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can , 2015, SIGCOMM.
[15] Fabien Hermenier,et al. Multi-objective job placement in clusters , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[16] Michael I. Jordan,et al. Managing data transfers in computer clusters with orchestra , 2011, SIGCOMM.
[17] Frédéric Giroire,et al. When Network Matters: Data Center Scheduling with Network Tasks , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.
[18] Ion Stoica,et al. Efficient Coflow Scheduling Without Prior Knowledge , 2015, SIGCOMM.
[19] Hongzi Mao,et al. Learning scheduling algorithms for data processing clusters , 2018, SIGCOMM.
[20] Jaesik Choi,et al. HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism , 2020, USENIX ATC.
[21] Antony I. T. Rowstron,et al. Decentralized task-aware scheduling for data center networks , 2014, SIGCOMM.
[22] Ziyang Li,et al. Branch Scheduling: DAG-Aware Scheduling for Speeding up Data-Parallel Jobs , 2019, 2019 IEEE/ACM 27th International Symposium on Quality of Service (IWQoS).
[23] Srikanth Kandula,et al. Leveraging endpoint flexibility in data-intensive clusters , 2013, SIGCOMM.
[24] Ion Stoica,et al. Caerus: NIMBLE Task Scheduling for Serverless Analytics , 2021, NSDI.
[25] Dhabaleswar K. Panda,et al. A Comprehensive Study of MapReduce Over Lustre for Intermediate Data Placement and Shuffle Strategies on HPC Clusters , 2017, IEEE Transactions on Parallel and Distributed Systems.
[26] Kang G. Shin,et al. Tiresias: A GPU Cluster Manager for Distributed Deep Learning , 2019, NSDI.
[27] Carlo Curino,et al. Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications , 2015, SIGMOD Conference.
[28] Abhishek Verma,et al. Large-scale cluster management at Google with Borg , 2015, EuroSys.
[29] Srikanth Kandula,et al. This Paper Is Included in the Proceedings of the 12th Usenix Symposium on Operating Systems Design and Implementation (osdi '16). Graphene: Packing and Dependency-aware Scheduling for Data-parallel Clusters G: Packing and Dependency-aware Scheduling for Data-parallel Clusters , 2022 .
[30] Robert N. M. Watson,et al. Firmament: Fast, Centralized Cluster Scheduling at Scale , 2016, OSDI.
[31] Zhen Zhang,et al. Is Network the Bottleneck of Distributed Training? , 2020, NetAI@SIGCOMM.
[32] Christian Scheideler,et al. Improved Bounds for Acyclic Job Shop Scheduling , 2002, Comb..
[33] Seif Haridi,et al. Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..
[34] George Bosilca,et al. Hierarchical DAG Scheduling for Hybrid Distributed Systems , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.
[35] Srikanth Kandula,et al. Multi-resource packing for cluster schedulers , 2014, SIGCOMM.
[36] Alexander Sergeev,et al. Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.
[37] Ion Stoica,et al. Coflow: a networking abstraction for cluster applications , 2012, HotNets-XI.
[38] Dong H. Ahn,et al. PRIONN: Predicting Runtime and IO using Neural Networks , 2018, ICPP.
[39] Jipeng Zhou,et al. Efficient online coflow routing and scheduling , 2016, MobiHoc.
[40] Wenguang Chen,et al. Spread-n-share: improving application performance and cluster throughput with resource-aware job placement , 2019, SC.
[41] Yibo Zhu,et al. A generic communication scheduler for distributed DNN training acceleration , 2019, SOSP.
[42] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[43] David A. Maltz,et al. Surviving failures in bandwidth-constrained datacenters , 2012, CCRV.
[44] Aditya Akella,et al. Altruistic Scheduling in Multi-Resource Clusters , 2016, OSDI.