暂无分享,去创建一个
[1] Aryan Mokhtari,et al. Robust and Communication-Efficient Collaborative Learning , 2019, NeurIPS.
[2] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[3] Martin Jaggi,et al. Decentralized Stochastic Optimization and Gossip Algorithms with Compressed Communication , 2019, ICML.
[4] Michael G. Rabbat,et al. Stochastic Gradient Push for Distributed Deep Learning , 2018, ICML.
[5] Martin Jaggi,et al. Sparsified SGD with Memory , 2018, NeurIPS.
[6] Kenneth Heafield,et al. Sparse Communication for Distributed Gradient Descent , 2017, EMNLP.
[7] Martin Jaggi,et al. Decentralized Deep Learning with Arbitrary Communication Compression , 2019, ICLR.
[8] Cong Xu,et al. TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning , 2017, NIPS.
[9] Ali H. Sayed,et al. A Linearly Convergent Proximal Gradient Algorithm for Decentralized Optimization , 2019, NeurIPS.
[10] Sebastian U. Stich,et al. Local SGD Converges Fast and Communicates Little , 2018, ICLR.
[11] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[12] Rong Jin,et al. On the Linear Speedup Analysis of Communication Efficient Momentum SGD for Distributed Non-Convex Optimization , 2019, ICML.
[13] Kamyar Azizzadenesheli,et al. signSGD: compressed optimisation for non-convex problems , 2018, ICML.
[14] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.
[15] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[16] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[18] Hanlin Tang,et al. Communication Compression for Decentralized Training , 2018, NeurIPS.
[19] William J. Dally,et al. Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training , 2017, ICLR.
[20] Xiang Li,et al. Communication Efficient Decentralized Training with Multiple Local Updates , 2019, ArXiv.
[21] Shenghuo Zhu,et al. Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning , 2018, AAAI.
[22] Wei Zhang,et al. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.
[23] Tao Lin,et al. Don't Use Large Mini-Batches, Use Local SGD , 2018, ICLR.
[24] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.