Communication Efficient Sparsification for Large Scale Machine Learning
暂无分享,去创建一个
Mikael Johansson | Arda Aytekin | Sarit Khirirat | Sindri Magn'usson | Sarit Khirirat | M. Johansson | S. Magnússon | Arda Aytekin
[1] Mikael Johansson,et al. POLO: a POLicy-based Optimization library , 2018, ArXiv.
[2] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[3] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[4] Hamid Reza Feyzmahdavian,et al. An asynchronous mini-batch algorithm for regularized stochastic optimization , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).
[5] Yijun Huang,et al. Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization , 2015, NIPS.
[6] Martin Jaggi,et al. Sparsified SGD with Memory , 2018, NeurIPS.
[7] Shaohuai Shi,et al. A Distributed Synchronous SGD Algorithm with Global Top-k Sparsification for Low Bandwidth Networks , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).
[8] Sarit Khirirat,et al. Distributed learning with compressed gradients , 2018, 1806.06573.
[9] Dan Alistarh,et al. The Convergence of Sparsified Gradient Methods , 2018, NeurIPS.
[10] Ji Liu,et al. Gradient Sparsification for Communication-Efficient Distributed Optimization , 2017, NeurIPS.
[11] Robert D. Nowak,et al. Quantized incremental algorithms for distributed optimization , 2005, IEEE Journal on Selected Areas in Communications.
[12] Dimitris S. Papailiopoulos,et al. ATOMO: Communication-efficient Learning via Atomic Sparsification , 2018, NeurIPS.
[13] Carlo Fischione,et al. Convergence of Limited Communication Gradient Methods , 2018, IEEE Transactions on Automatic Control.
[14] Song Han,et al. Trained Ternary Quantization , 2016, ICLR.
[15] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[16] Dong Yu,et al. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs , 2014, INTERSPEECH.
[17] Deanna Needell,et al. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , 2013, Mathematical Programming.
[18] Dan Alistarh,et al. Gradient compression for communication-limited convex optimization , 2018, 2018 IEEE Conference on Decision and Control (CDC).
[19] John N. Tsitsiklis,et al. Communication complexity of convex optimization , 1986, 1986 25th IEEE Conference on Decision and Control.