暂无分享,去创建一个
Liu Ziyin | Masahito Ueda | Takashi Mori | Kangqiao Liu | Takashi Mori | Liu Ziyin | Kangqiao Liu | Masakuni Ueda
[1] Renato Renner,et al. Discovering physical concepts with neural networks , 2018, Physical review letters.
[2] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[3] Zhanxing Zhu,et al. The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects , 2018, ICML.
[4] H. Kramers. Brownian motion in a field of force and the diffusion model of chemical reactions , 1940 .
[5] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[6] Alireza Seif,et al. Machine learning the thermodynamic arrow of time , 2019, Nature Physics.
[7] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.
[8] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[9] Vardan Papyan,et al. Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians , 2019, ICML.
[10] Yuchen Zhang,et al. A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.
[11] E Weinan,et al. Stochastic Modified Equations and Adaptive Stochastic Gradient Algorithms , 2015, ICML.
[12] Quoc V. Le,et al. A Bayesian Perspective on Generalization and Stochastic Gradient Descent , 2017, ICLR.
[13] David J. C. MacKay,et al. Bayesian Model Comparison and Backprop Nets , 1991, NIPS.
[14] Pushmeet Kohli,et al. Unveiling the predictive power of static structure in glassy systems , 2020 .
[15] Zhanxing Zhu,et al. On the Noisy Gradient Descent that Generalizes as SGD , 2019, ICML.
[16] Hiroshi Nakagawa,et al. Approximation Analysis of Stochastic Gradient Langevin Dynamics by using Fokker-Planck Equation and Ito Process , 2014, ICML.
[17] Lei Wu. How SGD Selects the Global Minima in Over-parameterized Learning : A Dynamical Stability Perspective , 2018 .
[18] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[19] Elad Hoffer,et al. Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.
[20] Masashi Sugiyama,et al. A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima , 2020, ICLR.
[21] J. Langer. Statistical theory of the decay of metastable states , 1969 .