暂无分享,去创建一个
Kai Chen | Hao Wang | Yiming Zhang | Yiqing Ma | Yiqing Ma | Hao Wang | Yiming Zhang | Kai Chen
[1] Nikhil R. Devanur,et al. PipeDream: generalized pipeline parallelism for DNN training , 2019, SOSP.
[2] Luciano Floridi,et al. GPT-3: Its Nature, Scope, Limits, and Consequences , 2020, Minds and Machines.
[3] Forrest N. Iandola,et al. DenseNet: Implementing Efficient ConvNet Descriptor Pyramids , 2014, ArXiv.
[4] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[5] Michael M. Swift,et al. ATP: In-network Aggregation for Multi-tenant Learning , 2021, NSDI.
[6] Mo Dong,et al. PCC: Re-architecting Congestion Control for Consistent High Performance , 2014, NSDI.
[7] Shivnath Babu,et al. Tuning Database Configuration Parameters with iTuned , 2009, Proc. VLDB Endow..
[8] Ion Stoica,et al. Efficient coflow scheduling with Varys , 2014, SIGCOMM.
[9] Albert G. Greenberg,et al. Data center TCP (DCTCP) , 2010, SIGCOMM '10.
[10] Kai Chen,et al. Rethinking Transport Layer Design for Distributed Machine Learning , 2019, APNet.
[11] Sheng Wang,et al. Rapier: Integrating routing and scheduling for coflow-aware data center networks , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).
[12] Hao Wang,et al. Domain-specific Communication Optimization for Distributed DNN Training , 2020, ArXiv.
[13] Wei Bai,et al. Information-Agnostic Flow Scheduling for Commodity Data Centers , 2015, NSDI.
[14] Sangeetha Abdu Jyothi,et al. TicTac: Accelerating Distributed Deep Learning with Communication Scheduling , 2018, MLSys.
[15] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[16] David Botstein,et al. SGD: Saccharomyces Genome Database , 1998, Nucleic Acids Res..
[17] Gennady Pekhimenko,et al. Priority-based Parameter Propagation for Distributed DNN Training , 2019, SysML.
[18] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[19] Byung-Gon Chun,et al. Automating System Configuration of Distributed Machine Learning , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).
[20] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[21] Yibo Zhu,et al. A generic communication scheduler for distributed DNN training acceleration , 2019, SOSP.
[22] Michael K. Buckland,et al. Annual Review of Information Science and Technology , 2006, J. Documentation.
[23] Onur Mutlu,et al. Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds , 2017, NSDI.
[24] Pengtao Xie,et al. Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters , 2017, USENIX Annual Technical Conference.
[25] Yiming Zhang,et al. Rate-aware flow scheduling for commodity data center networks , 2017, IEEE INFOCOM 2017 - IEEE Conference on Computer Communications.
[26] Barbara J. Grosz,et al. Natural-Language Processing , 1982, Artificial Intelligence.
[27] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[28] Yibo Zhu,et al. A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters , 2020, OSDI.
[29] Feng Liu,et al. AuTO: scaling deep reinforcement learning for datacenter-scale automatic traffic optimization , 2018, SIGCOMM.
[30] Xin Yuan,et al. Bandwidth optimal all-reduce algorithms for clusters of workstations , 2009, J. Parallel Distributed Comput..
[31] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.
[32] Nando de Freitas,et al. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.
[33] Pieter Hintjens,et al. ZeroMQ: Messaging for Many Applications , 2013 .
[34] Haitao Wu,et al. RDMA over Commodity Ethernet at Scale , 2016, SIGCOMM.
[35] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[36] Kai Chen,et al. Programmable Switch as a Parallel Computing Device , 2018, ArXiv.
[37] Kai Chen,et al. RAT - Resilient Allreduce Tree for Distributed Machine Learning , 2020, APNet.
[38] Minlan Yu,et al. CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics , 2017, NSDI.
[39] Chuck Yoo,et al. TensorExpress: In-Network Communication Scheduling for Distributed Deep Learning , 2020, 2020 IEEE 13th International Conference on Cloud Computing (CLOUD).
[40] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[41] Kaiyong Zhao,et al. AutoML: A Survey of the State-of-the-Art , 2019, Knowl. Based Syst..
[42] Kai Chen,et al. Towards Zero Copy Dataflows using RDMA , 2017, SIGCOMM Posters and Demos.
[43] Panos Kalnis,et al. Scaling Distributed Machine Learning with In-Network Aggregation , 2019, NSDI.
[44] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[45] Kai Chen,et al. Divide-and-Shuffle Synchronization for Distributed Machine Learning , 2020, ArXiv.
[46] Kai Chen,et al. Stream: Decentralized opportunistic inter-coflow scheduling for datacenter networks , 2016, 2016 IEEE 24th International Conference on Network Protocols (ICNP).
[47] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.
[48] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[49] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[50] Seunghak Lee,et al. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.
[51] Yanhui Geng,et al. CODA: Toward Automatically Identifying and Scheduling Coflows in the Dark , 2016, SIGCOMM.
[52] Haitao Wu,et al. Towards minimal-delay deadline-driven data center TCP , 2013, HotNets.