暂无分享,去创建一个
[1] Jean-Louis Goffin,et al. On convergence rates of subgradient optimization methods , 1977, Math. Program..
[2] Prateek Jain,et al. Parallelizing Stochastic Approximation Through Mini-Batching and Tail-Averaging , 2016, ArXiv.
[3] Richard Socher,et al. A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation , 2018, ICLR.
[4] H. Robbins. A Stochastic Approximation Method , 1951 .
[5] Ruosong Wang,et al. On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.
[6] Yossi Arjevani,et al. Analytic Characterization of the Hessian in Shallow ReLU Models: A Tale of Symmetry , 2020, NeurIPS.
[7] Francis R. Bach,et al. Averaged Least-Mean-Squares: Bias-Variance Trade-offs and Optimal Sampling Distributions , 2015, AISTATS.
[8] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.
[9] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[10] Francesco Orabona,et al. A Second look at Exponential and Cosine Step Sizes: Simplicity, Convergence, and Performance , 2020 .
[11] S. Mitra,et al. Matrix Partial Orders, Shorted Operators and Applications , 2010 .
[12] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[13] Sham M. Kakade,et al. Competing with the Empirical Risk Minimizer in a Single Pass , 2014, COLT.
[14] Eric Moulines,et al. Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.
[15] Yann LeCun,et al. Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond , 2016, 1611.07476.
[16] Lorenzo Rosasco,et al. Iterate averaging as regularization for stochastic gradient descent , 2018, COLT.
[17] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[18] Michael W. Mahoney,et al. PyHessian: Neural Networks Through the Lens of the Hessian , 2019, 2020 IEEE International Conference on Big Data (Big Data).
[19] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[20] Prateek Jain,et al. Accelerating Stochastic Gradient Descent , 2017, COLT.
[21] Sham M. Kakade,et al. The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure , 2019, NeurIPS.
[22] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.