Grouper: Accelerating Hyperparameter Searching in Deep Learning Clusters With Network Scheduling
暂无分享,去创建一个
Pan Zhou | Gang Sun | Hongfang Yu | Gang Sun | Hongfang Yu | Pan Zhou
[1] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[2] Beng Chin Ooi,et al. Rafiki: Machine Learning as an Analytics Service System , 2018, Proc. VLDB Endow..
[3] Kang G. Shin,et al. Tiresias: A GPU Cluster Manager for Distributed Deep Learning , 2019, NSDI.
[4] Bertrand M. T. Lin,et al. Parallel dedicated machine scheduling with conflict graphs , 2018, Comput. Ind. Eng..
[5] Chen Tian,et al. Scheduling Cofiows of Multi-stage Jobs to Minimize the Total Weighted Job Completion Time , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.
[6] Michael J. Freedman,et al. SLAQ: quality-driven scheduling for distributed machine learning , 2017, SoCC.
[7] Lars Kotthoff,et al. Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA , 2017, J. Mach. Learn. Res..
[8] Andreas S. Schulz. Scheduling to Minimize Total Weighted Completion Time: Performance Guarantees of LP-Based Heuristics and Lower Bounds , 1996, IPCO.
[9] Wei Bai,et al. Information-Agnostic Flow Scheduling for Commodity Data Centers , 2015, NSDI.
[10] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[11] Enhong Chen,et al. One more queue is enough: Minimizing flow completion time with explicit priority notification , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.
[12] Hong Liu,et al. Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network , 2015, Comput. Commun. Rev..
[13] Ion Stoica,et al. Efficient coflow scheduling with Varys , 2014, SIGCOMM.
[14] Fred Baker,et al. Configuration Guidelines for DiffServ Service Classes , 2006, RFC.
[15] Pengtao Xie,et al. Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters , 2017, USENIX Annual Technical Conference.
[16] Matthieu Cord,et al. GoSGD: Distributed Optimization for Deep Learning with Gossip Exchange , 2018, Neurocomputing.
[17] Antony I. T. Rowstron,et al. Decentralized task-aware scheduling for data center networks , 2014, SIGCOMM.
[18] Thomas R. Henderson,et al. Network Simulations with the ns-3 Simulator , 2008 .
[19] Olatunji Ruwase,et al. HyperDrive: exploring hyperparameters with POP scheduling , 2017, Middleware.
[20] Michel X. Goemans,et al. Improved approximation algorthims for scheduling with release dates , 1997, SODA '97.
[21] David M. Brooks,et al. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[22] Samir Khuller,et al. Select and Permute: An Improved Online Framework for Scheduling to Minimize Weighted Completion Time , 2017, LATIN.
[23] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[24] Chuan Wu,et al. Deep Learning-based Job Placement in Distributed Machine Learning Clusters , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.
[25] Wencong Xiao,et al. Gandiva: Introspective Cluster Scheduling for Deep Learning , 2018, OSDI.
[26] Wencong Xiao,et al. Multi-tenant GPU Clusters for Deep Learning Workloads: Analysis and Implications , 2018 .
[27] Sheng Wang,et al. Towards Practical and Near-Optimal Coflow Scheduling for Data Center Networks , 2016, IEEE Transactions on Parallel and Distributed Systems.
[28] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[29] David B. Shmoys,et al. Scheduling to Minimize Average Completion Time: Off-Line and On-Line Approximation Algorithms , 1997, Math. Oper. Res..
[30] Nick McKeown,et al. A Distributed Algorithm to Calculate Max-Min Fair Rates Without Per-Flow State , 2019, Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems.
[31] Amin Vahdat,et al. Sincronia: near-optimal network design for coflows , 2018, SIGCOMM.
[32] Chuan Wu,et al. Optimus: an efficient dynamic resource scheduler for deep learning clusters , 2018, EuroSys.
[33] Khaled A. Harras,et al. Eiffel: Efficient and Flexible Software Packet Scheduling , 2018, NSDI.
[34] Jianping Wu,et al. Joint optimization of tasks placement and routing to minimize Coflow Completion Time , 2019, J. Netw. Comput. Appl..
[35] Maurice Queyranne,et al. Structure of a simple scheduling polyhedron , 1993, Math. Program..
[36] Wencong Xiao,et al. Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads , 2019, USENIX Annual Technical Conference.
[37] Ke Li,et al. Efficient File Dissemination in Data Center Networks With Priority-Based Adaptive Multicast , 2020, IEEE Journal on Selected Areas in Communications.
[38] Amin Vahdat,et al. A scalable, commodity data center network architecture , 2008, SIGCOMM '08.
[39] Samir Khuller,et al. On Scheduling Coflows , 2020, Algorithmica.
[40] Yoshua Bengio,et al. Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..
[41] Lars Kotthoff,et al. Automated Machine Learning: Methods, Systems, Challenges , 2019, The Springer Series on Challenges in Machine Learning.
[42] Xingang Shi,et al. Efficient Scheduling of Weighted Coflows in Data Centers , 2019, IEEE Transactions on Parallel and Distributed Systems.
[43] Ameet Talwalkar,et al. Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization , 2016, J. Mach. Learn. Res..
[44] Hans Kellerer,et al. Parallel dedicated machines scheduling with chain precedence constraints , 2012, Eur. J. Oper. Res..
[45] Amit Kumar,et al. Order Scheduling Models: Hardness and Algorithms , 2007, FSTTCS.
[46] Ola Svensson,et al. Minimizing the sum of weighted completion times in a concurrent open shop , 2010, Oper. Res. Lett..
[47] Alessandro Agnetis,et al. Scheduling three chains on two parallel machines , 2010, Eur. J. Oper. Res..
[48] Ameet Talwalkar,et al. Non-stochastic Best Arm Identification and Hyperparameter Optimization , 2015, AISTATS.