Effect of Depth and Width on Local Minima in Deep Learning
暂无分享,去创建一个
Leslie Pack Kaelbling | Kenji Kawaguchi | Jiaoyang Huang | L. Kaelbling | Kenji Kawaguchi | Jiaoyang Huang
[1] Ohad Shamir,et al. Are ResNets Provably Better than Linear Predictors? , 2018, NeurIPS.
[2] Robert H. Halstead,et al. Matrix Computations , 2011, Encyclopedia of Parallel Computing.
[3] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[4] Leslie Pack Kaelbling,et al. Bayesian Optimization with Exponential Convergence , 2015, NIPS.
[5] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.
[6] Will Tribbey,et al. Numerical Recipes: The Art of Scientific Computing (3rd Edition) is written by William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery, and published by Cambridge University Press, © 2007, hardback, ISBN 978-0-521-88068-8, 1235 pp. , 1987, SOEN.
[7] Yoshua Bengio,et al. Depth with Nonlinearity Creates No Bad Local Minima in ResNets , 2019, Neural Networks.
[8] Sanjeev Arora,et al. On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.
[9] Katta G. Murty,et al. Some NP-complete problems in quadratic and nonlinear programming , 1987, Math. Program..
[10] Ronald L. Rivest,et al. Training a 3-node neural network is NP-complete , 1988, COLT '88.
[11] Matthias Hein,et al. Optimization Landscape and Expressivity of Deep CNNs , 2017, ICML.
[12] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .
[13] Le Song,et al. Diverse Neural Network Learns True Target Functions , 2016, AISTATS.
[14] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[15] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.
[16] P. Massart,et al. Adaptive estimation of a quadratic functional by model selection , 2000 .
[17] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[18] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[19] Shalabh,et al. Linear Models and Generalizations: Least Squares and Alternatives , 2007 .
[20] Matthias Hein,et al. The Loss Surface of Deep and Wide Neural Networks , 2017, ICML.
[21] Tengyu Ma,et al. Identity Matters in Deep Learning , 2016, ICLR.
[22] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[23] Roi Livni,et al. On the Computational Efficiency of Training Neural Networks , 2014, NIPS.
[24] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[25] Allan Pinkus,et al. Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.
[26] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[27] Matus Telgarsky,et al. Benefits of Depth in Neural Networks , 2016, COLT.
[28] William H. Press,et al. Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .
[29] Wei Hu,et al. A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks , 2018, ICLR.