暂无分享,去创建一个
Yaoyu Zhang | Tao Luo | Zhongwang Zhang | Zhi-Qin John Xu | Yaoyu Zhang | Tao Luo | Z. Xu | Zhongwang Zhang
[1] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[2] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[3] Francis Bach,et al. On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.
[4] Yoh-ichi Mototake,et al. Semi-flat minima and saddle points by embedding neural networks to overparameterization , 2019, NeurIPS.
[5] Joan Bruna,et al. Gradient Dynamics of Shallow Univariate ReLU Networks , 2019, NeurIPS.
[6] Yoshua Bengio,et al. A Closer Look at Memorization in Deep Networks , 2017, ICML.
[7] George Em Karniadakis,et al. Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness , 2019, Neural Networks.
[8] M. Wyart,et al. Disentangling feature and lazy training in deep neural networks , 2019, Journal of Statistical Mechanics: Theory and Experiment.
[9] Joar Skalse,et al. Neural networks are a priori biased towards Boolean functions with low entropy , 2019, ArXiv.
[10] Grant M. Rotskoff,et al. Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks , 2018, NeurIPS.
[11] E Weinan,et al. The Quenching-Activation Behavior of the Gradient Descent Dynamics for Two-layer Neural Network Models , 2020, ArXiv.
[12] Shilin He,et al. Assessing the Bilingual Knowledge Learned by Neural Machine Translation Models , 2020, ArXiv.
[13] L. Breiman. Reflections After Refereeing Papers for NIPS , 2018 .
[14] Zheng Ma,et al. A type of generalization error induced by initialization in deep neural networks , 2019, MSML.
[15] Zheng Ma,et al. Frequency Principle: Fourier Analysis Sheds Light on Deep Neural Networks , 2019, Communications in Computational Physics.
[16] Justin A. Sirignano,et al. Mean field analysis of neural networks: A central limit theorem , 2018, Stochastic Processes and their Applications.
[17] Yang Yuan,et al. Asymmetric Valleys: Beyond Sharp and Flat Local Minima , 2019, NeurIPS.
[18] Wulfram Gerstner,et al. Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances , 2021, ICML.
[19] R. Fisher. THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .
[20] Yaoyu Zhang,et al. Embedding Principle: a hierarchical structure of loss landscape of deep neural networks , 2021, Journal of Machine Learning.
[21] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[22] Lei Wu,et al. Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes , 2017, ArXiv.
[23] Lei Wu,et al. A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics , 2019, Science China Mathematics.
[24] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[25] Surya Ganguli,et al. Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel , 2020, NeurIPS.
[26] Ruosong Wang,et al. On Exact Computation with an Infinitely Wide Neural Net , 2019, NeurIPS.
[27] 俊一 甘利. 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .
[28] Yaim Cooper. Global Minima of Overparameterized Neural Networks , 2021, SIAM J. Math. Data Sci..
[29] Yann LeCun,et al. Singularity of the Hessian in Deep Learning , 2016, ArXiv.
[30] Andrea Montanari,et al. A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.
[31] Fred Zhang,et al. SGD on Neural Networks Learns Functions of Increasing Complexity , 2019, NeurIPS.
[32] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[33] Lei Wu,et al. Towards a Mathematical Understanding of Neural Network-Based Machine Learning: what we know and what we don't , 2020, CSIAM Transactions on Applied Mathematics.
[34] Michael Carbin,et al. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks , 2018, ICLR.
[35] Zhi-Qin John Xu,et al. Training behavior of deep neural network in frequency domain , 2018, ICONIP.
[36] Mikhail Burtsev,et al. Loss Landscape Sightseeing with Multi-Point Optimization , 2019, ArXiv.