暂无分享,去创建一个
Liu Ziyin | Masahito Ueda | Kangqiao Liu | Takashi Mori | Takashi Mori | Liu Ziyin | Kangqiao Liu | Masakuni Ueda
[1] P. Stoica,et al. On the expectation of the product of four matrix-valued Gaussian random variables , 1988 .
[2] R. Tweedie,et al. Exponential convergence of Langevin distributions and their discrete approximations , 1996 .
[3] M. Levy,et al. POWER LAWS ARE LOGARITHMIC BOLTZMANN LAWS , 1996, adap-org/9607001.
[4] A. Benjamin,et al. Proofs that Really Count: The Art of Combinatorial Proof , 2003 .
[5] Ioana Dumitriu,et al. Path Counting and Random Matrix Theory , 2003, Electron. J. Comb..
[6] D. Glass. Proofs That Really Count: The Art of Combinatorial Proof , 2004 .
[7] Mark E. J. Newman,et al. Power-Law Distributions in Empirical Data , 2007, SIAM Rev..
[8] Yee Whye Teh,et al. Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.
[9] Ahn,et al. Bayesian posterior sampling via stochastic gradient Fisher scoring Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring , 2012 .
[10] Tianqi Chen,et al. Stochastic Gradient Hamiltonian Monte Carlo , 2014, ICML.
[11] Quoc V. Le,et al. Adding Gradient Noise Improves Learning for Very Deep Networks , 2015, ArXiv.
[12] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[13] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[14] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[15] Matus Telgarsky,et al. Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.
[16] Elad Hoffer,et al. Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.
[17] David M. Blei,et al. Stochastic Gradient Descent as Approximate Bayesian Inference , 2017, J. Mach. Learn. Res..
[18] Yoshua Bengio,et al. Three Factors Influencing Minima in SGD , 2017, ArXiv.
[19] Tomaso A. Poggio,et al. Theory of Deep Learning IIb: Optimization Properties of SGD , 2018, ArXiv.
[20] Yoshua Bengio,et al. A Walk with SGD , 2018, ArXiv.
[21] Quoc V. Le,et al. A Bayesian Perspective on Generalization and Stochastic Gradient Descent , 2017, ICLR.
[22] Yann LeCun,et al. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.
[23] Zhanxing Zhu,et al. The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects , 2018, ICML.
[24] Gaël Richard,et al. First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise , 2019, NeurIPS.
[25] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[26] Levent Sagun,et al. A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks , 2019, ICML.
[27] Christos Thrampoulidis,et al. A Model of Double Descent for High-dimensional Binary Linear Classification , 2019, Information and Inference: A Journal of the IMA.
[28] Praneeth Netrapalli,et al. Non-Gaussianity of Stochastic Gradient Noise , 2019, ArXiv.
[29] Zhi-Ming Ma,et al. Dynamic of Stochastic Gradient Descent with State-Dependent Noise , 2020, ArXiv.
[30] Farhad Pourpanah,et al. Recent advances in deep learning , 2020, International Journal of Machine Learning and Cybernetics.
[31] Zhanxing Zhu,et al. On the Noisy Gradient Descent that Generalizes as SGD , 2019, ICML.
[32] Masahito Ueda,et al. Stochastic Gradient Descent with Large Learning Rate , 2020, arXiv.org.
[33] Andrew M. Saxe,et al. High-dimensional dynamics of generalization error in neural networks , 2017, Neural Networks.
[34] D. Tao,et al. Recent advances in deep learning theory , 2020, ArXiv.
[35] Masashi Sugiyama,et al. A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima , 2020, ICLR.
[36] Michael W. Mahoney,et al. Multiplicative noise and heavy tails in stochastic optimization , 2020, ICML.
[37] Colin Wei,et al. Shape Matters: Understanding the Implicit Bias of the Noise Covariance , 2020, COLT.
[38] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.