Multi-tenant GPU Clusters for Deep Learning Workloads: Analysis and Implications
暂无分享,去创建一个
Wencong Xiao | Amar Phanishayee | Myeongjae Jeon | Junjie Qian | Shivaram Venkataraman | Fan Yang | S. Venkataraman | Amar Phanishayee | Myeongjae Jeon | Junjie Qian | Wencong Xiao | Fan Yang
[1] Dror G. Feitelson,et al. Packing Schemes for Gang Scheduling , 1996, JSSPP.
[2] Helen J. Wang,et al. Online aggregation , 1997, SIGMOD '97.
[3] Surajit Chaudhuri,et al. Dynamic sample selection for approximate query processing , 2003, SIGMOD '03.
[4] Luiz André Barroso,et al. Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.
[5] David E. Culler,et al. The ganglia distributed monitoring system: design, implementation, and experience , 2004, Parallel Comput..
[6] Chris Jermaine,et al. Scalable approximate query processing with the DBO engine , 2007, SIGMOD '07.
[7] Andrew V. Goldberg,et al. Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.
[8] Joseph M. Hellerstein,et al. Online aggregation and continuous query support in MapReduce , 2010, SIGMOD Conference.
[9] Hairong Kuang,et al. The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).
[10] Scott Shenker,et al. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.
[11] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[12] Rajeev Gandhi,et al. An Analysis of Traces from a Production MapReduce Cluster , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.
[13] Albert G. Greenberg,et al. Scarlett: coping with skewed content popularity in mapreduce clusters , 2011, EuroSys '11.
[14] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[15] Hermann Ney,et al. LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.
[16] Michael Abd-El-Malek,et al. Omega: flexible, scalable schedulers for large compute clusters , 2013, EuroSys '13.
[17] Carlo Curino,et al. Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.
[18] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[19] John Langford,et al. A reliable effective terascale linear learning system , 2011, J. Mach. Learn. Res..
[20] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[21] Sameer Agarwal,et al. Queries with Bounded Errors & Bounded Response Times on Very Large Data , 2014 .
[22] Wei Lin,et al. Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing , 2014, OSDI.
[23] Adam Wierman,et al. This Paper Is Included in the Proceedings of the 11th Usenix Symposium on Networked Systems Design and Implementation (nsdi '14). Grass: Trimming Stragglers in Approximation Analytics Grass: Trimming Stragglers in Approximation Analytics , 2022 .
[24] Ion Stoica,et al. The Power of Choice in Data-Aware Cluster Scheduling , 2014, OSDI.
[25] Abhishek Verma,et al. Large-scale cluster management at Google with Borg , 2015, EuroSys.
[26] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[27] Djamel Djenouri,et al. Distributed Low-Latency Data Aggregation Scheduling in Wireless Sensor Networks , 2015, ACM Trans. Sens. Networks.
[28] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[29] Amit Agarwal,et al. CNTK: Microsoft's Open-Source Deep-Learning Toolkit , 2016, KDD.
[30] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[32] Ion Stoica,et al. iOLAP: Managing Uncertainty for Efficient Incremental OLAP , 2016, SIGMOD Conference.
[33] Ioannis Mitliagkas,et al. Omnivore: An Optimizer for Multi-device Deep Learning on CPUs and GPUs , 2016, ArXiv.
[34] Quan Chen,et al. Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers , 2016, ASPLOS.
[35] Natalia Gimelshein,et al. vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[36] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[37] C. Rossbach,et al. Full Virtualization for GPUs Reconsidered , 2017 .
[38] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[39] Paramvir Bahl,et al. Live Video Analytics at Scale with Approximation and Delay-Tolerance , 2017, NSDI.
[40] Xin Wang,et al. DeepProf: Performance Analysis for Deep Learning Applications via Mining GPU Execution Patterns , 2017, ArXiv.
[41] Nikil Dutt,et al. Special session: quality-configurable memory hierarchy through approximation , 2017, 2017 International Conference on Compilers, Architectures and Synthesis For Embedded Systems (CASES).
[42] Michael J. Freedman,et al. SLAQ: quality-driven scheduling for distributed machine learning , 2017, SoCC.
[43] Quan Chen,et al. Prophet: Precise QoS Prediction on Non-Preemptive Accelerators to Improve Utilization in Warehouse-Scale Computers , 2017, ASPLOS.
[44] Purushottam Kulkarni. Dynamic GPU Memory Management for Training Deep Neural Networks , 2018 .
[45] Wencong Xiao,et al. Gandiva: Introspective Cluster Scheduling for Deep Learning , 2018, OSDI.
[46] Chuan Wu,et al. Optimus: an efficient dynamic resource scheduler for deep learning clusters , 2018, EuroSys.
[47] Zenglin Xu,et al. Superneurons: dynamic GPU memory management for training deep neural networks , 2018, PPoPP.
[48] Brian Kingsbury,et al. Kernel Approximation Methods for Speech Recognition , 2017, J. Mach. Learn. Res..