Round-Robin Synchronization: Mitigating Communication Bottlenecks in Parameter Servers
暂无分享,去创建一个
[1] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[2] Onur Mutlu,et al. Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds , 2017, NSDI.
[3] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[4] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[5] Alexander J. Smola,et al. Efficient mini-batch training for stochastic optimization , 2014, KDD.
[6] Ji Liu,et al. Staleness-Aware Async-SGD for Distributed Deep Learning , 2015, IJCAI.
[7] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[8] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[9] Eric P. Xing,et al. GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server , 2016, EuroSys.
[10] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[11] Shaohuai Shi,et al. MG-WFBP: Efficient Data Communication for Distributed Synchronous SGD Algorithms , 2018, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.
[12] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[13] Eric P. Xing,et al. Exploiting iterative-ness for parallel ML computations , 2014, SoCC.
[14] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[15] Song Han,et al. Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.
[16] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Eric P. Xing,et al. Addressing the straggler problem for iterative convergent parallel ML , 2016, SoCC.
[18] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[19] Samy Bengio,et al. Revisiting Distributed Synchronous SGD , 2016, ArXiv.
[20] Ming Yang,et al. Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.
[21] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[22] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[23] Seunghak Lee,et al. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.
[24] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..
[25] Jiawei Jiang,et al. Heterogeneity-aware Distributed Parameter Servers , 2017, SIGMOD Conference.
[26] Jack J. Dongarra,et al. A scalable framework for heterogeneous GPU-based clusters , 2012, SPAA '12.
[27] Trishul M. Chilimbi,et al. Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.
[28] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[29] Pengtao Xie,et al. Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters , 2017, USENIX Annual Technical Conference.
[30] Paramvir Bahl,et al. Augmenting data center networks with multi-gigabit wireless links , 2011, SIGCOMM 2011.
[31] John Langford,et al. Slow Learners are Fast , 2009, NIPS.