暂无分享,去创建一个
[1] P. Absil,et al. Erratum to: ``Global rates of convergence for nonconvex optimization on manifolds'' , 2016, IMA Journal of Numerical Analysis.
[2] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[3] Elad Hoffer,et al. Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.
[4] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .
[5] Xianglong Liu,et al. Orthogonal Weight Normalization: Solution to Optimization over Multiple Dependent Stiefel Manifolds in Deep Neural Networks , 2017, AAAI.
[6] Yoshua Bengio,et al. Three Factors Influencing Minima in SGD , 2017, ArXiv.
[7] Chunpeng Wu,et al. SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning , 2018, 1805.07898.
[8] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[9] Xiaolin Li,et al. Generalized Batch Normalization: Towards Accelerating Deep Neural Networks , 2018, AAAI.
[10] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[11] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[12] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[13] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[14] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[15] Robert E. Mahony,et al. Optimization Algorithms on Matrix Manifolds , 2007 .
[16] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[17] 史惠蓉,et al. Robert子宫二例 , 2009 .
[18] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[19] David W. Jacobs,et al. Automated Inference with Adaptive Batches , 2017, AISTATS.
[20] Lei Wu,et al. How SGD Selects the Global Minima in Over-parameterized Learning: A Dynamical Stability Perspective , 2018, NeurIPS.
[21] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[22] Minhyung Cho,et al. Riemannian approach to batch normalization , 2017, NIPS.
[23] Hao Li,et al. Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.
[24] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[25] Pierre-Antoine Absil,et al. Joint Diagonalization on the Oblique Manifold for Independent Component Analysis , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[26] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[27] Huan Wang,et al. Identifying Generalization Properties in Neural Networks , 2018, ArXiv.
[28] Manfred K. Warmuth. Proceedings of the seventh annual conference on Computational learning theory , 1994, COLT 1994.