Accelerating Distributed Training in Heterogeneous Clusters via a Straggler-Aware Parameter Server
暂无分享,去创建一个
Xi Li | Yahui Hu | Zongwei Zhu | Yuming Cheng | Huihuang Yu | XiangLan Chen
[1] Kevin T. Pedretti,et al. The impact of system design parameters on application noise sensitivity , 2010, 2010 IEEE International Conference on Cluster Computing.
[2] Yijun Huang,et al. Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization , 2015, NIPS.
[3] Xindong Wu,et al. Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.
[4] Parijat Dube,et al. Slow and Stale Gradients Can Win the Race , 2018, IEEE Journal on Selected Areas in Information Theory.
[5] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[6] Chao Wang,et al. MALOC: A Fully Pipelined FPGA Accelerator for Convolutional Neural Networks With All Layers Mapped on Chip , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[7] Samy Bengio,et al. Revisiting Distributed Synchronous SGD , 2016, ArXiv.
[8] R. Sindhu Reddy,et al. DLAU: A Scalable Deep Learning Accelerator Unit on FPGA , 2018 .
[9] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[10] Xuehai Zhou,et al. SparseNN: A Performance-Efficient Accelerator for Large-Scale Sparse Neural Networks , 2017, International Journal of Parallel Programming.
[11] Jiawei Jiang,et al. Heterogeneity-aware Distributed Parameter Servers , 2017, SIGMOD Conference.
[12] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[13] Frank Wood,et al. Bayesian Distributed Stochastic Gradient Descent , 2018, NeurIPS.
[14] Trishul M. Chilimbi,et al. Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.
[15] Jeffrey Dean,et al. Achieving Rapid Response Times in Large Online Services , 2012 .
[16] Seunghak Lee,et al. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.
[17] John C. Duchi,et al. Distributed delayed stochastic optimization , 2011, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).
[18] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[19] Eric P. Xing,et al. Addressing the straggler problem for iterative convergent parallel ML , 2016, SoCC.
[20] Amit Agarwal,et al. CNTK: Microsoft's Open-Source Deep-Learning Toolkit , 2016, KDD.
[21] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.