暂无分享,去创建一个
Tian Ye | Ruoyu Sun | Peijun Xiao | Tian Ye | Ruoyu Sun | Peijun Xiao
[1] Ji Liu,et al. DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression , 2019, ICML.
[2] Peter Richtárik,et al. Distributed Learning with Compressed Gradient Differences , 2019, ArXiv.
[3] Zhi-Quan Luo,et al. On the Convergence of the LMS Algorithm with Adaptive Learning Rate for Linear Feedforward Networks , 1991, Neural Computation.
[4] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[5] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[6] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.
[7] Masafumi Yamazaki,et al. Yet Another Accelerated SGD: ResNet-50 Training on ImageNet in 74.7 seconds , 2019, ArXiv.
[8] Zhize Li,et al. Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization , 2020, ICML.
[9] Francis Bach,et al. SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.
[10] Jianyu Wang,et al. Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms , 2018, ArXiv.
[11] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[12] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[13] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[14] Aryan Mokhtari,et al. FedPAQ: A Communication-Efficient Federated Learning Method with Periodic Averaging and Quantization , 2019, AISTATS.
[15] Fan Zhou,et al. On the convergence properties of a K-step averaging stochastic gradient descent algorithm for nonconvex optimization , 2017, IJCAI.
[16] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[17] Trishul M. Chilimbi,et al. Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.
[18] Yaoliang Yu,et al. Petuum: A New Platform for Distributed Machine Learning on Big Data , 2013, IEEE Transactions on Big Data.
[19] Xiaorui Liu,et al. A Double Residual Compression Algorithm for Efficient Distributed Learning , 2019, AISTATS.
[20] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.
[21] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..
[22] Anit Kumar Sahu,et al. Federated Learning: Challenges, Methods, and Future Directions , 2019, IEEE Signal Processing Magazine.
[23] Mikhail Belkin,et al. Accelerating SGD with momentum for over-parameterized learning , 2018, ICLR.
[24] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[25] Mark W. Schmidt,et al. Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron , 2018, AISTATS.
[26] Sarit Khirirat,et al. Distributed learning with compressed gradients , 2018, 1806.06573.
[27] Mikhail Belkin,et al. MaSS: an Accelerated Stochastic Method for Over-parametrized Learning , 2018, ArXiv.
[28] John N. Tsitsiklis,et al. Gradient Convergence in Gradient methods with Errors , 1999, SIAM J. Optim..
[29] Ying-Chang Liang,et al. Federated Learning in Mobile Edge Networks: A Comprehensive Survey , 2020, IEEE Communications Surveys & Tutorials.
[30] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[31] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[32] Xiang Li,et al. On the Convergence of FedAvg on Non-IID Data , 2019, ICLR.
[33] Shenghuo Zhu,et al. Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning , 2018, AAAI.
[34] James Demmel,et al. ImageNet Training in Minutes , 2017, ICPP.
[35] Nathan Srebro,et al. Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization , 2018, NeurIPS.
[36] K. B. Letaief,et al. A Survey on Mobile Edge Computing: The Communication Perspective , 2017, IEEE Communications Surveys & Tutorials.
[37] John N. Tsitsiklis,et al. Communication complexity of convex optimization , 1986, 1986 25th IEEE Conference on Decision and Control.
[38] Peter Richtárik,et al. First Analysis of Local GD on Heterogeneous Data , 2019, ArXiv.
[39] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[40] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.
[41] Tianjian Chen,et al. Federated Machine Learning: Concept and Applications , 2019 .
[42] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[43] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[44] Pritish Narayanan,et al. Deep Learning with Limited Numerical Precision , 2015, ICML.
[45] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[46] Peter Elias,et al. Universal codeword sets and representations of the integers , 1975, IEEE Trans. Inf. Theory.