暂无分享,去创建一个
[1] Satrajit Chatterjee,et al. Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization , 2020, ICLR.
[2] Richard Nock,et al. Advances and Open Problems in Federated Learning , 2021, Found. Trends Mach. Learn..
[3] Yoshua Bengio,et al. Three Factors Influencing Minima in SGD , 2017, ArXiv.
[4] Venkatesh Saligrama,et al. Federated Learning Based on Dynamic Regularization , 2021, ICLR.
[5] Jascha Sohl-Dickstein,et al. Measuring the Effects of Data Parallelism on Neural Network Training , 2018, J. Mach. Learn. Res..
[6] Angelia Nedic,et al. Distributed Gradient Methods for Convex Machine Learning Problems in Networks: Distributed Optimization , 2020, IEEE Signal Processing Magazine.
[7] David M. Blei,et al. Stochastic Gradient Descent as Approximate Bayesian Inference , 2017, J. Mach. Learn. Res..
[8] Raef Bassily,et al. The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning , 2017, ICML.
[9] 俊一 甘利. 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .
[10] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[11] Li Chen,et al. Accelerating Federated Learning via Momentum Gradient Descent , 2019, IEEE Transactions on Parallel and Distributed Systems.
[12] Konstantin Mishchenko,et al. Tighter Theory for Local SGD on Identical and Heterogeneous Data , 2020, AISTATS.
[13] Soham De,et al. On the Origin of Implicit Regularization in Stochastic Gradient Descent , 2021, ICLR.
[14] Martin Jaggi,et al. Extrapolation for Large-batch Training in Deep Learning , 2020, ICML.
[15] Gregory Cohen,et al. EMNIST: Extending MNIST to handwritten letters , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).
[16] David G.T. Barrett,et al. Implicit Gradient Regularization , 2020, ArXiv.
[17] Mark W. Schmidt,et al. Fast Convergence of Stochastic Gradient Descent under a Strong Growth Condition , 2013, 1308.6370.
[18] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[19] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[20] Peter Richtárik,et al. Federated Optimization: Distributed Machine Learning for On-Device Intelligence , 2016, ArXiv.
[21] Satrajit Chatterjee,et al. Making Coherence Out of Nothing At All: Measuring the Evolution of Gradient Alignment , 2020, ArXiv.
[22] Anit Kumar Sahu,et al. Federated Optimization in Heterogeneous Networks , 2018, MLSys.
[23] H. Robbins. A Stochastic Approximation Method , 1951 .
[24] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[25] Dimitris S. Papailiopoulos,et al. Gradient Diversity: a Key Ingredient for Scalable Distributed Learning , 2017, AISTATS.
[26] Blaise Agüera y Arcas,et al. Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.
[27] Srini Narayanan,et al. Stiffness: A New Perspective on Generalization in Neural Networks , 2019, ArXiv.
[28] Tzu-Ming Harry Hsu,et al. Measuring the Effects of Non-Identical Data Distribution for Federated Visual Classification , 2019, ArXiv.
[29] Kurt Keutzer,et al. Hessian-based Analysis of Large Batch Training and Robustness to Adversaries , 2018, NeurIPS.
[30] Stefano Soatto,et al. Stochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks , 2017, 2018 Information Theory and Applications Workshop (ITA).
[31] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[32] Manzil Zaheer,et al. Adaptive Federated Optimization , 2020, ICLR.
[33] Sashank J. Reddi,et al. SCAFFOLD: Stochastic Controlled Averaging for Federated Learning , 2019, ICML.
[34] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[35] Tao Lin,et al. Don't Use Large Mini-Batches, Use Local SGD , 2018, ICLR.
[36] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[37] Sebastian Caldas,et al. LEAF: A Benchmark for Federated Settings , 2018, ArXiv.
[38] Quoc V. Le,et al. A Bayesian Perspective on Generalization and Stochastic Gradient Descent , 2017, ICLR.
[39] Joshua Achiam,et al. On First-Order Meta-Learning Algorithms , 2018, ArXiv.
[40] Ohad Shamir,et al. Is Local SGD Better than Minibatch SGD? , 2020, ICML.
[41] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .