暂无分享,去创建一个
David J. Fleet | David Duvenaud | Jimmy Ba | Fartash Faghri | Fartash Faghri | Jimmy Ba | D. Duvenaud
[1] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Carlo Luschi,et al. Revisiting Small Batch Training for Deep Neural Networks , 2018, ArXiv.
[3] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.
[4] Jitendra Malik,et al. Are All Training Examples Created Equal? An Empirical Study , 2018, ArXiv.
[5] Jascha Sohl-Dickstein,et al. Measuring the Effects of Data Parallelism on Neural Network Training , 2018, J. Mach. Learn. Res..
[6] Nicolas Le Roux,et al. On the interplay between noise and curvature and its effect on optimization and generalization , 2019, AISTATS.
[7] Frederik Kunstner,et al. Limitations of the empirical Fisher approximation for natural gradient descent , 2019, NeurIPS.
[8] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Tao Lin,et al. Don't Use Large Mini-Batches, Use Local SGD , 2018, ICLR.
[10] James Martens,et al. New perspectives on the natural gradient method , 2014, ArXiv.
[11] Aurélien Lucchi,et al. Variance Reduced Stochastic Gradient Descent with Neighbors , 2015, NIPS.
[12] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[13] Zhanxing Zhu,et al. The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects , 2018, ICML.
[14] Li Fei-Fei,et al. MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.
[15] Guodong Zhang,et al. Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model , 2019, NeurIPS.
[16] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.
[17] Yoshua Bengio,et al. Adaptive Importance Sampling to Accelerate Training of a Neural Probabilistic Language Model , 2008, IEEE Transactions on Neural Networks.
[18] Mark W. Schmidt,et al. Fast Convergence of Stochastic Gradient Descent under a Strong Growth Condition , 2013, 1308.6370.
[19] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[20] Andrea Montanari,et al. The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve , 2019, Communications on Pure and Applied Mathematics.
[21] Jascha Sohl-Dickstein,et al. SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability , 2017, NIPS.
[22] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[23] James Martens,et al. New Insights and Perspectives on the Natural Gradient Method , 2014, J. Mach. Learn. Res..
[24] Joseph Gonzalez,et al. On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent , 2018, ArXiv.
[25] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[26] Nicolas Le Roux,et al. Improving First and Second-Order Methods by Modeling Uncertainty , 2010 .
[27] Anshumali Shrivastava,et al. Fast and Accurate Stochastic Gradient Estimation , 2019, NeurIPS.
[28] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[29] T. Tao. Topics in Random Matrix Theory , 2012 .
[30] Shun-ichi Amari,et al. Pathological spectra of the Fisher information metric and its variants in deep neural networks , 2019, ArXiv.
[31] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[32] Tong Zhang,et al. Accelerating Minibatch Stochastic Gradient Descent using Stratified Sampling , 2014, ArXiv.
[33] H. Robbins. A Stochastic Approximation Method , 1951 .
[34] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[35] Jeffrey Dean,et al. Accelerating Deep Learning by Focusing on the Biggest Losers , 2019, ArXiv.
[36] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[37] Ian J. Goodfellow,et al. Efficient Per-Example Gradient Computations , 2015, ArXiv.
[38] Léon Bottou,et al. On the Ineffectiveness of Variance Reduced Optimization for Deep Learning , 2018, NeurIPS.
[39] Oriol Vinyals,et al. Qualitatively characterizing neural network optimization problems , 2014, ICLR.
[40] Levent Sagun,et al. A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks , 2019, ICML.
[41] Michael Figurnov,et al. Monte Carlo Gradient Estimation in Machine Learning , 2019, J. Mach. Learn. Res..
[42] Guodong Zhang,et al. An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise , 2019 .
[43] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[44] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.