暂无分享,去创建一个
[1] V. Arnold. Mathematical Methods of Classical Mechanics , 1974 .
[2] D. Talay,et al. The law of the Euler scheme for stochastic differential equations , 1996 .
[3] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[4] Michael I. Jordan,et al. Understanding the acceleration phenomenon via high-resolution differential equations , 2018, Mathematical Programming.
[5] G. N. Mil’shtejn. Approximate Integration of Stochastic Differential Equations , 1975 .
[6] M. Ledoux,et al. Analysis and Geometry of Markov Diffusion Operators , 2013 .
[7] Andre Wibisono,et al. A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.
[8] Levent Sagun,et al. A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks , 2019, ICML.
[9] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[10] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[11] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[12] Stefano Soatto,et al. Stochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks , 2017, 2018 Information Theory and Applications Workshop (ITA).
[13] M. Freidlin,et al. Random Perturbations of Dynamical Systems , 1984 .
[14] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[15] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Denis Talay,et al. The law of the Euler scheme for stochastic differential equations , 1996, Monte Carlo Methods Appl..
[17] Peter L. Bartlett,et al. Acceleration and Averaging in Stochastic Descent Dynamics , 2017, NIPS.
[18] Matus Telgarsky,et al. Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.
[19] J. Marsden,et al. A mathematical introduction to fluid mechanics , 1979 .
[20] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.
[21] Sanjeev Arora,et al. An Exponential Learning Rate Schedule for Deep Learning , 2020, ICLR.
[22] Michael I. Jordan,et al. Generalized Momentum-Based Methods: A Hamiltonian Perspective , 2019, SIAM J. Optim..
[23] Stephen P. Boyd,et al. A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..
[24] F. Nier,et al. Hypoelliptic Estimates and Spectral Theory for Fokker-Planck Operators and Witten Laplacians , 2005 .
[25] David M. Blei,et al. A Variational Analysis of Stochastic Gradient Algorithms , 2016, ICML.
[26] Michael Hitrik,et al. Tunnel effect and symmetries for Kramers–Fokker–Planck type operators , 2010, Journal of the Institute of Mathematics of Jussieu.
[27] Michael I. Jordan. DYNAMICAL, SYMPLECTIC AND STOCHASTIC PERSPECTIVES ON GRADIENT-BASED OPTIMIZATION , 2019, Proceedings of the International Congress of Mathematicians (ICM 2018).
[28] D. Talay,et al. Discretization and simulation of stochastic differential equations , 1985 .
[29] Michael I. Jordan,et al. How Does Learning Rate Decay Help Modern Neural Networks , 2019 .
[30] F. Nier. Quantitative analysis of metastability in reversible diffusion processes via a Witten complex approach. , 2004 .
[31] Yoshua Bengio,et al. Three Factors Influencing Minima in SGD , 2017, ArXiv.
[32] S. Varadhan,et al. Large deviations , 2019, Graduate Studies in Mathematics.
[33] P. Kloeden,et al. The approximation of multiple stochastic integrals , 1992 .
[34] Yoshua Bengio,et al. Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.
[35] Ruoyu Sun,et al. Optimization for deep learning: theory and algorithms , 2019, ArXiv.
[36] Nathan Srebro,et al. Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.
[37] P. Lions. Generalized Solutions of Hamilton-Jacobi Equations , 1982 .
[38] Leslie N. Smith,et al. Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).
[39] Jessika Eichel,et al. Partial Differential Equations Second Edition , 2016 .
[40] Dmitriy Drusvyatskiy,et al. Stochastic algorithms with geometric step decay converge linearly on sharp functions , 2019, Mathematical Programming.
[41] P. Cannarsa,et al. Semiconcave Functions, Hamilton-Jacobi Equations, and Optimal Control , 2004 .
[42] Yoshua Bengio,et al. On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length , 2018, ICLR.
[43] A. S. Kronfeld,et al. Dynamics of Langevin simulations , 1992, hep-lat/9205008.
[44] Vladimir Igorevich Arnold,et al. Geometrical Methods in the Theory of Ordinary Differential Equations , 1983 .
[45] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.
[46] Hermano Frid,et al. Vanishing Viscosity Limit for Initial-Boundary Value Problems for Conservation Laws , 1999 .
[47] V. Arnold,et al. Topological methods in hydrodynamics , 1998 .
[48] S. Zienau. Quantum Physics , 1969, Nature.
[49] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[50] Colin Wei,et al. Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks , 2019, NeurIPS.
[51] Stefano Soatto,et al. Deep relaxation: partial differential equations for optimizing deep neural networks , 2017, Research in the Mathematical Sciences.
[52] E Weinan,et al. Stochastic Modified Equations and Adaptive Stochastic Gradient Algorithms , 2015, ICML.
[53] Cédric Villani,et al. Hypocoercive Diffusion Operators , 2006 .
[54] Kenneth F. Caluya,et al. Gradient Flow Algorithms for Density Propagation in Stochastic Systems , 2019, IEEE Transactions on Automatic Control.
[55] H. Kushner,et al. Stochastic Approximation and Recursive Algorithms and Applications , 2003 .
[56] P. Lions,et al. Some Properties of Viscosity Solutions of Hamilton-Jacobi Equations. , 1984 .
[57] Quoc V. Le,et al. Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.
[58] Yuchen Zhang,et al. A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.
[59] C. Hwang. Laplace's Method Revisited: Weak Convergence of Probability Measures , 1980 .
[60] Denis Talay,et al. Efficient numerical schemes for the approximation of expectations of functionals of the solution of a S.D.E., and applications , 1984 .
[61] Israel Michael Sigal,et al. Introduction to Spectral Theory: With Applications to Schrödinger Operators , 1995 .
[62] Weijie Su,et al. Robust Learning Rate Selection for Stochastic Optimization via Splitting Diagnostic , 2019, ArXiv.
[63] C. Villani,et al. ON THE TREND TO EQUILIBRIUM FOR THE FOKKER-PLANCK EQUATION : AN INTERPLAY BETWEEN PHYSICS AND FUNCTIONAL ANALYSIS , 2004 .
[64] Peter L. Bartlett,et al. Adaptive Online Gradient Descent , 2007, NIPS.
[65] A. Bovier,et al. Metastability in reversible diffusion processes II. Precise asymptotics for small eigenvalues , 2005 .
[66] L. Evans. On solving certain nonlinear partial differential equations by accretive operator methods , 1980 .
[67] G. Mil’shtein. Weak Approximation of Solutions of Systems of Stochastic Differential Equations , 1986 .
[68] Hangfeng He,et al. The Local Elasticity of Neural Networks , 2020, ICLR.
[69] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[70] S. Zagatti. On viscosity solutions of Hamilton-Jacobi equations , 2008 .
[71] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[72] Simon M. J. Lyons. Introduction to stochastic differential equations , 2011 .
[73] Laurent Michel,et al. About small eigenvalues of the Witten Laplacian , 2017, Pure and Applied Analysis.
[74] F. Bach,et al. Bridging the gap between constant step size stochastic gradient descent and Markov chains , 2017, The Annals of Statistics.
[75] Denis Talay,et al. Resolution trajectorielle et analyse numerique des equations differentielles stochastiques , 1983 .
[76] P. K. Kundu,et al. Fluid Mechanics: Fourth Edition , 2008 .