GAI: A Centralized Tree-Based Scheduler for Machine Learning Workload in Large Shared Clusters
暂无分享,去创建一个
Hongming Cai | Rui Ren | Ce Gao | Hongming Cai | Ce Gao | Rui Ren
[1] Jie Jiang,et al. Angel: a new large-scale machine learning system , 2018 .
[2] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Athanasios V. Vasilakos,et al. Daphne: A Flexible and Hybrid Scheduling Framework in Multi-Tenant Clusters , 2018, IEEE Transactions on Network and Service Management.
[4] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[5] Abhishek Verma,et al. Large-scale cluster management at Google with Borg , 2015, EuroSys.
[6] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[8] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[9] Fang Dong,et al. BAR: An Efficient Data Locality Driven Task Scheduling Algorithm for Cloud Computing , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
[10] Kejiang Ye,et al. Imbalance in the cloud: An analysis on Alibaba cluster trace , 2017, 2017 IEEE International Conference on Big Data (Big Data).
[11] Christina Delimitrou,et al. Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.
[12] Randy H. Katz,et al. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.
[13] David M. Brooks,et al. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective , 2018, 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[14] Christina Delimitrou,et al. Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.
[15] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[16] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[17] Michael Abd-El-Malek,et al. Omega: flexible, scalable schedulers for large compute clusters , 2013, EuroSys '13.
[18] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[19] Patrick Wendell,et al. Sparrow: distributed, low latency scheduling , 2013, SOSP.
[20] Carlo Curino,et al. Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.
[21] Palash Bera,et al. How colors in business dashboards affect users' decision making , 2016, Commun. ACM.
[22] Robert N. M. Watson,et al. Firmament: Fast, Centralized Cluster Scheduling at Scale , 2016, OSDI.
[23] Ryan Kastner,et al. A hardware accelerated system for high throughput cellular image analysis , 2018, J. Parallel Distributed Comput..
[24] Carlo Curino,et al. Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters , 2015, USENIX Annual Technical Conference.
[25] Anne-Marie Kermarrec,et al. Hawk: Hybrid Datacenter Scheduling , 2015, USENIX Annual Technical Conference.