Deep Learning-based Job Placement in Distributed Machine Learning Clusters
暂无分享,去创建一个
Chuan Wu | Yanghua Peng | Yixin Bao | Chuan Wu | Yanghua Peng | Yixin Bao
[1] Carlo Curino,et al. Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.
[2] Cheng-Zhong Xu,et al. Interference and locality-aware task scheduling for MapReduce applications in virtual clusters , 2013, HPDC.
[3] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.
[4] Dejan S. Milojicic,et al. HPC-Aware VM Placement in Infrastructure Clouds , 2013, 2013 IEEE International Conference on Cloud Engineering (IC2E).
[5] Randy H. Katz,et al. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.
[6] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[7] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[8] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[9] Chi Harold Liu,et al. Experience-driven Networking: A Deep Reinforcement Learning based Approach , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.
[10] Michael J. Freedman,et al. SLAQ: quality-driven scheduling for distributed machine learning , 2017, SoCC.
[11] Zongpeng Li,et al. Online Job Scheduling in Distributed Machine Learning Clusters , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.
[12] Srikanth Kandula,et al. Multi-resource packing for cluster schedulers , 2014, SIGCOMM.
[13] Hai Jin,et al. Heterogeneity and Interference-Aware Virtual Machine Provisioning for Predictable Performance in the Cloud , 2016, IEEE Transactions on Computers.
[14] Christina Delimitrou,et al. Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.
[15] Hongzi Mao,et al. Learning Graph-based Cluster Scheduling Algorithms , 2018 .
[16] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.
[17] Hai Jin,et al. Network-Aware Task Assignment for MapReduce Applications in Shared Clusters , 2015 .
[18] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[19] Tom Schaul,et al. StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.
[20] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[21] Randy H. Katz,et al. Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.
[22] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Samy Bengio,et al. Device Placement Optimization with Reinforcement Learning , 2017, ICML.
[24] Hongzi Mao,et al. Neural Adaptive Video Streaming with Pensieve , 2017, SIGCOMM.
[25] Srikanth Kandula,et al. Resource Management with Deep Reinforcement Learning , 2016, HotNets.
[26] Chuan Wu,et al. Optimus: an efficient dynamic resource scheduler for deep learning clusters , 2018, EuroSys.
[27] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[28] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[29] Shengen Yan,et al. Towards Distributed Machine Learning in Shared Clusters: A Dynamically-Partitioned Approach , 2017, 2017 IEEE International Conference on Smart Computing (SMARTCOMP).
[30] Christina Delimitrou,et al. Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.
[31] Qinru Qiu,et al. A Hierarchical Framework of Cloud Resource Allocation and Power Management Using Deep Reinforcement Learning , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).
[32] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[33] H. Howie Huang,et al. TRACON: Interference-Aware Schedulingfor Data-Intensive Applicationsin Virtualized Environments , 2011, IEEE Transactions on Parallel and Distributed Systems.
[34] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[35] Abhishek Verma,et al. Large-scale cluster management at Google with Borg , 2015, EuroSys.