A Sum-of-Ratios Multi-Dimensional-Knapsack Decomposition for DNN Resource Scheduling
暂无分享,去创建一个
[1] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[2] John E. Beasley,et al. A Genetic Algorithm for the Multidimensional Knapsack Problem , 1998, J. Heuristics.
[3] Shengen Yan,et al. Towards Distributed Machine Learning in Shared Clusters: A Dynamically-Partitioned Approach , 2017, 2017 IEEE International Conference on Smart Computing (SMARTCOMP).
[4] Roland W. Freund,et al. Solving the Sum-of-Ratios Problem by an Interior-Point Method , 2001, J. Glob. Optim..
[5] Pengtao Xie,et al. Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters , 2017, USENIX Annual Technical Conference.
[6] S. Schaible. A note on the sum of a linear and linear‐fractional function , 1977 .
[7] A. Frieze,et al. Approximation algorithms for the m-dimensional 0–1 knapsack problem: Worst-case and probabilistic analyses , 1984 .
[8] Trishul M. Chilimbi,et al. Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.
[9] Yibo Zhu,et al. A generic communication scheduler for distributed DNN training acceleration , 2019, SOSP.
[10] Wencong Xiao,et al. Gandiva: Introspective Cluster Scheduling for Deep Learning , 2018, OSDI.
[11] Mokhtar S. Bazaraa,et al. Nonlinear Programming: Theory and Algorithms, 3/E. , 2019 .
[12] Zongpeng Li,et al. Online Job Scheduling in Distributed Machine Learning Clusters , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.
[13] Carlo Curino,et al. Morpheus: Towards Automated SLOs for Enterprise Clusters , 2016, OSDI.
[14] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[15] Benjamin Hindman,et al. Dominant Resource Fairness: Fair Allocation of Multiple Resource Types , 2011, NSDI.
[16] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[17] Mor Harchol-Balter,et al. TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters , 2016, EuroSys.
[18] S. Zionts,et al. Programming with linear fractional functionals , 1968 .
[19] Olatunji Ruwase,et al. Performance Modeling and Scalability Optimization of Distributed Deep Learning Systems , 2015, KDD.
[20] Mung Chiang,et al. Need for speed: CORA scheduler for optimizing completion-times in the cloud , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).
[21] Peiping Shen,et al. Range division and linearization algorithm for a class of linear ratios optimization problems , 2019, J. Comput. Appl. Math..
[22] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Chuan Wu,et al. Optimus: an efficient dynamic resource scheduler for deep learning clusters , 2018, EuroSys.
[24] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[25] Sangeetha Abdu Jyothi,et al. TicTac: Accelerating Distributed Deep Learning with Communication Scheduling , 2018, MLSys.
[26] Gennady Pekhimenko,et al. Priority-based Parameter Propagation for Distributed DNN Training , 2019, SysML.
[27] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[28] Forrest N. Iandola,et al. FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[29] Kang G. Shin,et al. Tiresias: A GPU Cluster Manager for Distributed Deep Learning , 2019, NSDI.
[30] Amar Phanishayee,et al. Themis: Fair and Efficient GPU Cluster Scheduling , 2020, NSDI.
[31] Abhishek Verma,et al. Large-scale cluster management at Google with Borg , 2015, EuroSys.