Hoplite: efficient and fault-tolerant collective communication for task-based distributed systems
暂无分享,去创建一个
Ion Stoica | Stephanie Wang | Eric Liang | Robert Nishihara | Philipp Moritz | Danyang Zhuo | Siyuan Zhuang | Zhuohan Li | Philipp Moritz | Robert Nishihara | Eric Liang | I. Stoica | Stephanie Wang | Danyang Zhuo | Zhuohan Li | Siyuan Zhuang
[1] Nikhil R. Devanur,et al. PipeDream: generalized pipeline parallelism for DNN training , 2019, SOSP.
[2] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[3] Matthew Rocklin,et al. Dask: Parallel Computation with Blocked algorithms and Task Scheduling , 2015, SciPy.
[4] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.
[5] Xiangyu Zhang,et al. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.
[6] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[7] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[8] Nikhil R. Devanur,et al. Blink: Fast and Generic Collectives for Distributed ML , 2019, MLSys.
[9] Miguel Castro,et al. SplitStream: high-bandwidth multicast in cooperative environments , 2003, SOSP '03.
[10] Christopher Olston,et al. TensorFlow-Serving: Flexible, High-Performance ML Serving , 2017, ArXiv.
[11] Michael I. Jordan,et al. RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.
[12] Ion Stoica,et al. Efficient coflow scheduling with Varys , 2014, SIGCOMM.
[13] Yibo Zhu,et al. A generic communication scheduler for distributed DNN training acceleration , 2019, SOSP.
[14] Richard L. Graham,et al. Open MPI: A Flexible High Performance MPI , 2005, PPAM.
[15] Michael I. Jordan,et al. Managing data transfers in computer clusters with orchestra , 2011, SIGCOMM.
[16] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[17] Ion Stoica,et al. Efficient Coflow Scheduling Without Prior Knowledge , 2015, SIGCOMM.
[18] Michael I. Jordan,et al. Ray: A Distributed Framework for Emerging AI Applications , 2017, OSDI.
[19] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[20] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[21] Xin Zhang,et al. TFX: A TensorFlow-Based Production-Scale Machine Learning Platform , 2017, KDD.
[22] Xin Wang,et al. Clipper: A Low-Latency Online Prediction Serving System , 2016, NSDI.
[23] Aaron Q. Li,et al. Parameter Server for Distributed Machine Learning , 2013 .
[24] Miguel Castro,et al. Scribe: a large-scale and decentralized application-level multicast infrastructure , 2002, IEEE J. Sel. Areas Commun..
[25] Stephanie Wang,et al. Lineage stash: fault tolerance off the critical path , 2019, SOSP.
[26] Shane Legg,et al. IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.
[27] Paramvir Bahl,et al. Low Latency Geo-distributed Data Analytics , 2015, SIGCOMM.
[28] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[29] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.
[30] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[31] Joseph E. Gonzalez,et al. A fault-tolerance shim for serverless computing , 2020, EuroSys.
[32] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[33] Quoc V. Le,et al. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.
[34] Steven Hand,et al. CIEL: A Universal Execution Engine for Distributed Data-Flow Computing , 2011, NSDI.
[35] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[36] Sujata Banerjee,et al. Application-driven bandwidth guarantees in datacenters , 2014, SIGCOMM.
[37] Alexander Sergeev,et al. Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.
[38] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Ion Stoica,et al. Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure , 2019, NSDI.
[40] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.