暂无分享,去创建一个
Jian Li | Guojun Xiong | Rahul Singh | Gang Yan | Rahul Singh | Guojun Xiong | Gang Yan | Jian Li
[1] Richard Nock,et al. Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..
[2] Martin J. Wainwright,et al. Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling , 2010, IEEE Transactions on Automatic Control.
[3] Suhas N. Diggavi,et al. Straggler Mitigation in Distributed Optimization Through Data Encoding , 2017, NIPS.
[4] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..
[5] Wei Zhang,et al. Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent , 2017, NIPS.
[6] Michael G. Rabbat,et al. Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization , 2017, Proceedings of the IEEE.
[7] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Laurent Massoulié,et al. Optimal Algorithms for Non-Smooth Distributed Optimization in Networks , 2018, NeurIPS.
[9] Giovanni Neglia,et al. Dynamic Backup Workers for Parallel Machine Learning , 2020, 2020 IFIP Networking Conference (Networking).
[10] Carlee Joe-Wong,et al. Towards Flexible Device Participation in Federated Learning for Non-IID Data , 2020, ArXiv.
[11] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[12] Stephen P. Boyd,et al. Distributed Average Consensus with Time-Varying Metropolis Weights ? , 2006 .
[13] Hanlin Tang,et al. Communication Compression for Decentralized Training , 2018, NeurIPS.
[14] Johannes Gehrke,et al. Gossip-based computation of aggregate information , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..
[15] John Langford,et al. Scaling up machine learning: parallel and distributed approaches , 2011, KDD '11 Tutorials.
[16] Heng Tao Shen,et al. Principal Component Analysis , 2009, Encyclopedia of Biometrics.
[17] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[18] Samy Bengio,et al. Revisiting Distributed Synchronous SGD , 2016, ArXiv.
[19] Stephen P. Boyd,et al. Fast linear iterations for distributed averaging , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).
[20] Seunghak Lee,et al. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.
[21] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[22] Alexander J. Smola,et al. An architecture for parallel topic models , 2010, Proc. VLDB Endow..
[23] Alexander J. Smola,et al. Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.
[24] Asuman E. Ozdaglar,et al. Distributed Subgradient Methods for Multi-Agent Optimization , 2009, IEEE Transactions on Automatic Control.
[25] Angelia Nedic,et al. Distributed Stochastic Subgradient Projection Algorithms for Convex Optimization , 2008, J. Optim. Theory Appl..
[26] Elad Hoffer,et al. Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.
[27] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[28] Haibo Yang,et al. Achieving Linear Speedup with Partial Worker Participation in Non-IID Federated Learning , 2021, ICLR.
[29] Stephen P. Boyd,et al. Randomized gossip algorithms , 2006, IEEE Transactions on Information Theory.
[30] Wei Zhang,et al. Asynchronous Decentralized Parallel Stochastic Gradient Descent , 2017, ICML.
[31] Quoc V. Le,et al. Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.
[32] Laurent Massoulié,et al. Optimal Algorithms for Smooth and Strongly Convex Distributed Optimization in Networks , 2017, ICML.
[33] Sébastien Bubeck,et al. Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..
[34] John N. Tsitsiklis,et al. Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms , 1984, 1984 American Control Conference.
[35] John N. Tsitsiklis,et al. Parallel and distributed computation , 1989 .
[36] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[37] Mikael Johansson,et al. A Randomized Incremental Subgradient Method for Distributed Optimization in Networked Systems , 2009, SIAM J. Optim..
[38] Stéphan Clémençon,et al. Gossip Dual Averaging for Decentralized Optimization of Pairwise Functions , 2016, ICML.
[39] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[40] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[41] Frank Wood,et al. Bayesian Distributed Stochastic Gradient Descent , 2018, NeurIPS.
[42] Shaohuai Shi,et al. Communication-Efficient Distributed Deep Learning: A Comprehensive Survey , 2020, ArXiv.
[43] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[44] Scott Shenker,et al. Usenix Association 10th Usenix Symposium on Networked Systems Design and Implementation (nsdi '13) 185 Effective Straggler Mitigation: Attack of the Clones , 2022 .
[45] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[46] Xuehai Qian,et al. Hop: Heterogeneity-aware Decentralized Training , 2019, ASPLOS.
[47] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[48] Yi Zhou,et al. Toward Understanding the Impact of Staleness in Distributed Machine Learning , 2018, ICLR.
[49] Luiz André Barroso,et al. The tail at scale , 2013, CACM.