How regularization affects the critical points in linear networks
暂无分享,去创建一个
[1] Yann LeCun,et al. Open Problem: The landscape of the loss surfaces of multilayer networks , 2015, COLT.
[2] N. Higham. Functions Of Matrices , 2008 .
[3] Stephen P. Boyd,et al. A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..
[4] Nathan Srebro,et al. Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.
[5] Yann Le Cun,et al. A Theoretical Framework for Back-Propagation , 1988 .
[6] Kurt Hornik,et al. Learning in linear neural networks: a survey , 1995, IEEE Trans. Neural Networks.
[7] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[8] Andre Wibisono,et al. A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.
[9] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[10] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[11] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[12] Kurt Hornik,et al. Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.
[13] Daniel Soudry,et al. No bad local minima: Data independent training error guarantees for multilayer neural networks , 2016, ArXiv.
[14] Michael I. Jordan,et al. Gradient Descent Converges to Minimizers , 2016, ArXiv.
[15] Tengyu Ma,et al. Identity Matters in Deep Learning , 2016, ICLR.
[16] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[17] Thomas Kailath,et al. A general weight matrix formulation using optimal control , 1991, IEEE Trans. Neural Networks.
[18] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[19] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[20] Nathan Srebro,et al. Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).
[21] W. Culver. On the existence and uniqueness of the real logarithm of a matrix , 1966 .
[22] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.