Depth with Nonlinearity Creates No Bad Local Minima in ResNets
暂无分享,去创建一个
[1] Lorenzo Rosasco,et al. Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review , 2016, International Journal of Automation and Computing.
[2] Allan Pinkus,et al. Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.
[3] Jason D. Lee,et al. On the Power of Over-parametrization in Neural Networks with Quadratic Activation , 2018, ICML.
[4] Yuanzhi Li,et al. Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.
[5] Matthias Hein,et al. Optimization Landscape and Expressivity of Deep CNNs , 2017, ICML.
[6] Philip M. Long,et al. Gradient Descent with Identity Initialization Efficiently Learns Positive-Definite Linear Transformations by Deep Residual Networks , 2018, Neural Computation.
[7] Amir Globerson,et al. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.
[8] Matus Telgarsky,et al. Benefits of Depth in Neural Networks , 2016, COLT.
[9] Tengyu Ma,et al. On the optimization landscape of tensor decompositions , 2017, Mathematical Programming.
[10] Ronald L. Rivest,et al. Training a 3-node neural network is NP-complete , 1988, COLT '88.
[11] Anima Anandkumar,et al. Provable Methods for Training Neural Networks with Sparse Connectivity , 2014, ICLR.
[12] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[13] Inderjit S. Dhillon,et al. Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.
[14] Tengyu Ma,et al. Identity Matters in Deep Learning , 2016, ICLR.
[15] Adam R. Klivans,et al. Learning Depth-Three Neural Networks in Polynomial Time , 2017, ArXiv.
[16] Roi Livni,et al. On the Computational Efficiency of Training Neural Networks , 2014, NIPS.
[17] Elad Hoffer,et al. Exponentially vanishing sub-optimal local minima in multilayer neural networks , 2017, ICLR.
[18] Thomas Laurent,et al. Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global , 2017, ICML.
[19] Leslie Pack Kaelbling,et al. Bayesian Optimization with Exponential Convergence , 2015, NIPS.
[20] Andreas Stolcke,et al. The Microsoft 2017 Conversational Speech Recognition System , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[22] Alexandr Andoni,et al. Learning Polynomials with Neural Networks , 2014, ICML.
[23] Matthias Hein,et al. The Loss Surface of Deep and Wide Neural Networks , 2017, ICML.
[24] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[25] Kyoung Mu Lee,et al. Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Katta G. Murty,et al. Some NP-complete problems in quadratic and nonlinear programming , 1987, Math. Program..
[27] Dimitri P. Bertsekas,et al. Nonlinear Programming , 1995 .
[28] Razvan Pascanu,et al. On the number of response regions of deep feed forward networks with piece-wise linear activations , 2013, 1312.6098.
[29] Ohad Shamir,et al. Are ResNets Provably Better than Linear Predictors? , 2018, NeurIPS.
[30] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Razvan Pascanu,et al. On the number of inference regions of deep feed forward networks with piece-wise linear activations , 2013, ICLR.
[32] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[33] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Mahdi Soltanolkotabi,et al. Learning ReLUs via Gradient Descent , 2017, NIPS.
[35] Yu Maruyama,et al. Global Continuous Optimization with Error Bound and Fast Convergence , 2016, J. Artif. Intell. Res..
[36] Tengyu Ma,et al. Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.
[37] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.
[38] Andrew R. Barron,et al. Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.