暂无分享,去创建一个
[1] Masashi Sugiyama,et al. A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima , 2020, ICLR.
[2] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[3] Sebastian Mentemeier,et al. Tail behaviour of stationary solutions of random difference equations: the case of regular matrices , 2010, 1009.1728.
[4] Michael W. Mahoney,et al. Traditional and Heavy-Tailed Self Regularization in Neural Network Models , 2019, ICML.
[5] Adel Mohammadpour,et al. On estimating the tail index and the spectral measure of multivariate $$\alpha $$α-stable distributions , 2015 .
[6] R. Srikant,et al. Finite-Time Error Bounds For Linear Stochastic Approximation and TD Learning , 2019, COLT.
[7] H. Bauke. Parameter estimation for power-law distributions by maximum likelihood methods , 2007, 0704.1867.
[8] Mark E. J. Newman,et al. Power-Law Distributions in Empirical Data , 2007, SIAM Rev..
[9] Thomas Mikosch,et al. Stochastic Models with Power-Law Tails , 2016 .
[10] H. Kesten. Random difference equations and Renewal theory for products of random matrices , 1973 .
[11] F. Bach,et al. Bridging the gap between constant step size stochastic gradient descent and Markov chains , 2017, The Annals of Statistics.
[12] Stefano Soatto,et al. Stochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks , 2017, 2018 Information Theory and Applications Workshop (ITA).
[13] I. Pavlyukevich. Cooling down Lévy flights , 2007, cond-mat/0701651.
[14] B. Øksendal. Stochastic differential equations : an introduction with applications , 1987 .
[15] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[16] V. Zolotarev,et al. Chance and Stability, Stable Distributions and Their Applications , 1999 .
[17] Michael I. Jordan,et al. Stochastic Gradient and Langevin Processes , 2019, ICML.
[18] Sham M. Kakade,et al. Competing with the Empirical Risk Minimizer in a Single Pass , 2014, COLT.
[19] C. Villani. Optimal Transport: Old and New , 2008 .
[20] D. Buraczewski,et al. ASYMPTOTICS OF STATIONARY SOLUTIONS OF MULTIVARIATE STOCHASTIC RECURSIONS WITH HEAVY TAILED INPUTS AND RELATED , 2010, 1011.1685.
[21] Rachel A. Ward,et al. Concentration inequalities for random matrix products , 2019, Linear Algebra and its Applications.
[22] P. Levy. Théorie de l'addition des variables aléatoires , 1955 .
[23] Sashank J. Reddi,et al. Why ADAM Beats SGD for Attention Models , 2019, ArXiv.
[24] Richard Socher,et al. Improving Generalization Performance by Switching from Adam to SGD , 2017, ArXiv.
[25] George Tzagkarakis,et al. Compressive Sensing of Temporally Correlated Sources Using Isotropic Multivariate Stable Laws , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).
[26] Zhanxing Zhu,et al. The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects , 2018, ICML.
[27] Andreas Veit,et al. Why are Adaptive Methods Good for Attention Models? , 2020, NeurIPS.
[28] G. Alsmeyer. On the stationary tail index of iterated random Lipschitz functions , 2014, 1409.2663.
[29] Pan Zhou,et al. Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning , 2020, NeurIPS.
[30] Michel L. Goldstein,et al. Problems with fitting to the power-law distribution , 2004, cond-mat/0402322.
[31] G. B. Arous,et al. The Spectrum of Heavy Tailed Random Matrices , 2007, 0707.2159.
[32] D. Buraczewski,et al. On the rate of convergence in the Kesten renewal theorem , 2015 .
[33] Wenqing Hu,et al. On the diffusion approximation of nonconvex stochastic gradient descent , 2017, Annals of Mathematical Sciences and Applications.
[34] C. Goldie. IMPLICIT RENEWAL THEORY AND TAILS OF SOLUTIONS OF RANDOM EQUATIONS , 1991 .
[35] Joel A. Tropp,et al. Matrix Concentration for Products , 2020, Foundations of Computational Mathematics.
[36] Edgar Dobriban,et al. The Implicit Regularization of Stochastic Gradient Flow for Least Squares , 2020, ICML.
[37] Jascha Sohl-Dickstein,et al. The large learning rate phase of deep learning: the catapult mechanism , 2020, ArXiv.
[38] Michael W. Mahoney,et al. Multiplicative noise and heavy tails in stochastic optimization , 2020, ICML.
[39] E Weinan,et al. Stochastic Modified Equations and Adaptive Stochastic Gradient Algorithms , 2015, ICML.
[40] Mert Gürbüzbalaban,et al. Global Convergence of Stochastic Gradient Hamiltonian Monte Carlo for Non-Convex Stochastic Optimization: Non-Asymptotic Performance Bounds and Momentum-Based Acceleration , 2018, Oper. Res..
[41] C. Kluppelberg,et al. Fractional L\'{e}vy-driven Ornstein--Uhlenbeck processes and stochastic differential equations , 2011, 1102.1830.
[42] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[43] Vygantas Paulauskas,et al. Once more on comparison of tail index estimators , 2011, 1104.1242.
[44] Praneeth Netrapalli,et al. Non-Gaussianity of Stochastic Gradient Noise , 2019, ArXiv.
[45] Levent Sagun,et al. A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks , 2019, ICML.
[46] Gaël Richard,et al. On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks , 2019, ArXiv.
[47] Charles M. Newman,et al. The distribution of Lyapunov exponents: Exact results for random matrices , 1986 .
[48] Gaël Richard,et al. First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise , 2019, NeurIPS.
[49] Persi Diaconis,et al. Iterated Random Functions , 1999, SIAM Rev..
[50] Yoshua Bengio,et al. Three Factors Influencing Minima in SGD , 2017, ArXiv.
[51] Alain Durmus,et al. Quantitative Propagation of Chaos for SGD in Wide Neural Networks , 2020, NeurIPS.
[52] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[53] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[54] Murat A. Erdogdu,et al. Hausdorff Dimension, Stochastic Differential Equations, and Generalization in Neural Networks , 2020, ArXiv.
[55] Francis R. Bach,et al. Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression , 2016, J. Mach. Learn. Res..
[56] David M. Blei,et al. A Variational Analysis of Stochastic Gradient Algorithms , 2016, ICML.
[57] Mariusz Mirek. Heavy tail phenomenon and convergence to stable laws for iterated Lipschitz maps , 2009, 0907.2261.
[58] Sebastian Mentemeier,et al. On multidimensional Mandelbrot cascades , 2014 .
[59] Prateek Jain,et al. Accelerating Stochastic Gradient Descent , 2017, ArXiv.
[60] Léon Bottou,et al. On the Ineffectiveness of Variance Reduced Optimization for Deep Learning , 2018, NeurIPS.