Online Job Scheduling in Distributed Machine Learning Clusters
暂无分享,去创建一个
Zongpeng Li | Chuan Wu | Yanghua Peng | Yixin Bao | Chuan Wu | Zongpeng Li | Yanghua Peng | Yixin Bao
[1] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[2] Karin Strauss,et al. Accelerating Deep Convolutional Neural Networks Using Specialized Hardware , 2015 .
[3] Jie Jiang,et al. Angel: a new large-scale machine learning system , 2018 .
[4] Bo Li,et al. Scheduling jobs across geo-distributed datacenters with max-min fairness , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.
[5] Zongpeng Li,et al. An Efficient Cloud Market Mechanism for Computing Jobs With Soft Deadlines , 2017, IEEE/ACM Transactions on Networking.
[6] Wei Lin,et al. Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing , 2014, OSDI.
[7] Benjamin Hindman,et al. Dominant Resource Fairness: Fair Allocation of Multiple Resource Types , 2011, NSDI.
[8] C. Lang,et al. The need for speed : Better movement quality during faster task performance after stroke , 2015 .
[9] Martin Wattenberg,et al. Ad click prediction: a view from the trenches , 2013, KDD.
[10] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.
[11] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[12] Randy H. Katz,et al. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.
[13] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Randy H. Katz,et al. Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.
[15] Mung Chiang,et al. Need for speed: CORA scheduler for optimizing completion-times in the cloud , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).
[16] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[17] Shengen Yan,et al. Towards Distributed Machine Learning in Shared Clusters: A Dynamically-Partitioned Approach , 2017, 2017 IEEE International Conference on Smart Computing (SMARTCOMP).
[18] Yossi Azar,et al. Truthful Online Scheduling with Commitments , 2015, EC.
[19] Trishul M. Chilimbi,et al. Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.
[20] Zongpeng Li,et al. Online Auctions in IaaS Clouds: Welfare and Profit Maximization With Server Costs , 2015, IEEE/ACM Transactions on Networking.
[21] Ishai Menache,et al. Efficient online scheduling for deadline-sensitive jobs: extended abstract , 2013, SPAA.
[22] Forrest N. Iandola,et al. FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Carlo Curino,et al. Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.
[24] Joo Seong Jeong,et al. Dolphin : Runtime Optimization for Distributed Machine Learning , 2016 .
[25] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[26] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[27] Abhishek Verma,et al. Large-scale cluster management at Google with Borg , 2015, EuroSys.
[28] Joseph Naor,et al. The Design of Competitive Online Algorithms via a Primal-Dual Approach , 2009, Found. Trends Theor. Comput. Sci..
[29] Mor Harchol-Balter,et al. TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters , 2016, EuroSys.
[30] Carlo Curino,et al. Reservation-based Scheduling: If You're Late Don't Blame Us! , 2014, SoCC.
[31] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[32] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[33] David E. Irwin,et al. Balancing risk and reward in a market-based task service , 2004, Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004..
[34] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[35] Bo Li,et al. Optimizing coflow completion times with utility max-min fairness , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.
[36] Olatunji Ruwase,et al. Performance Modeling and Scalability Optimization of Distributed Deep Learning Systems , 2015, KDD.
[37] Haipeng Luo,et al. Automatic Scaling of Internet Applications for Cloud Computing Services , 2014, IEEE Transactions on Computers.