Deep Neural Networks with Multi-Branch Architectures Are Intrinsically Less Non-Convex
暂无分享,去创建一个
Ruslan Salakhutdinov | Hongyang Zhang | Junru Shao | Hongyang R. Zhang | R. Salakhutdinov | Junru Shao | Hongyang Zhang
[1] R. Starr. Quasi-Equilibria in Markets with Non-Convex Preferences , 1969 .
[2] M. Wagner,et al. Generalized Linear Programming Solves the Dual , 1976 .
[3] D. Bertsekas,et al. Estimates of the duality gap for large-scale separable nonconvex optimization problems , 1982, 1982 21st IEEE Conference on Decision and Control.
[4] Kurt Hornik,et al. Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.
[5] Ronald L. Rivest,et al. Training a 3-node neural network is NP-complete , 1988, COLT '88.
[6] Michael L. Overton,et al. On the Sum of the Largest Eigenvalues of a Symmetric Matrix , 1992, SIAM J. Matrix Anal. Appl..
[7] Hava T. Siegelmann,et al. On the complexity of training neural networks with continuous activation functions , 1995, IEEE Trans. Neural Networks.
[8] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[9] P. Bartlett,et al. Hardness results for neural network approximation problems , 1999, Theor. Comput. Sci..
[10] Yonina C. Eldar,et al. Strong Duality in Nonconvex Quadratic Optimization with Two Quadratic Constraints , 2006, SIAM J. Optim..
[11] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[12] Bingsheng He,et al. On the O(1/n) Convergence Rate of the Douglas-Rachford Alternating Direction Method , 2012, SIAM J. Numer. Anal..
[13] Pierre Baldi,et al. Complex-Valued Autoencoders , 2011, Neural Networks.
[14] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[15] Xinhua Zhang,et al. Convex Deep Learning via Normalized Kernels , 2014, NIPS.
[16] Ruslan Salakhutdinov,et al. Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.
[17] Yann LeCun,et al. Open Problem: The landscape of the loss surfaces of multilayer networks , 2015, COLT.
[18] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[19] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[20] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[21] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[22] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[23] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[24] Maria-Florina Balcan,et al. Learning and 1-bit Compressed Sensing under Asymmetric Noise , 2016, COLT.
[25] Serge J. Belongie,et al. Residual Networks Behave Like Ensembles of Relatively Shallow Networks , 2016, NIPS.
[26] A. Tang,et al. Refined Shapely-Folkman Lemma and Its Application in Duality Gap Estimation , 2016, 1610.05416.
[27] Stephen P. Boyd,et al. Generalized Low Rank Models , 2014, Found. Trends Mach. Learn..
[28] Anima Anandkumar,et al. Efficient approaches for escaping higher order saddle points in non-convex optimization , 2016, COLT.
[29] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.
[30] Daniel Soudry,et al. No bad local minima: Data independent training error guarantees for multilayer neural networks , 2016, ArXiv.
[31] Stephen P. Boyd,et al. Bounding duality gap for separable problems with linear constraints , 2014, Comput. Optim. Appl..
[32] René Vidal,et al. Global Optimality in Neural Network Training , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Xavier Gastaldi,et al. Shake-Shake regularization , 2017, ArXiv.
[34] Joan Bruna,et al. Mathematics of Deep Learning , 2017, ArXiv.
[35] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[36] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Martin J. Wainwright,et al. Convexified Convolutional Neural Networks , 2016, ICML.
[38] Maria-Florina Balcan,et al. Sample and Computationally Efficient Learning Algorithms under S-Concave Distributions , 2017, NIPS.
[39] Tengyu Ma,et al. Identity Matters in Deep Learning , 2016, ICLR.
[40] A. d'Aspremont,et al. An Approximate Shapley-Folkman Theorem , 2017, 1712.08559.
[41] Martin J. Wainwright,et al. On the Learnability of Fully-Connected Neural Networks , 2017, AISTATS.
[42] Ohad Shamir,et al. Failures of Gradient-Based Deep Learning , 2017, ICML.
[43] Haihao Lu,et al. Depth Creates No Bad Local Minima , 2017, ArXiv.
[44] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.
[45] Le Song,et al. Diverse Neural Network Learns True Target Functions , 2016, AISTATS.
[46] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.
[47] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Francis R. Bach,et al. Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..
[49] Amir Globerson,et al. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.
[50] Tengyu Ma,et al. Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.
[51] R. Srikant,et al. Adding One Neuron Can Eliminate All Bad Local Minima , 2018, NeurIPS.
[52] R. Srikant,et al. Understanding the Loss Surface of Neural Networks for Binary Classification , 2018, ICML.
[53] David P. Woodruff,et al. Matrix Completion and Related Problems via Strong Duality , 2017, ITCS.
[54] Le Song,et al. Deep Semi-Random Features for Nonlinear Function Approximation , 2017, AAAI.
[55] Hao Li,et al. Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.
[56] Thomas Laurent,et al. The Multilinear Structure of ReLU Networks , 2017, ICML.
[57] Hongyang Zhang,et al. Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations , 2017, COLT.
[58] Ohad Shamir,et al. Spurious Local Minima are Common in Two-Layer ReLU Neural Networks , 2017, ICML.
[59] Anders Rantzer,et al. Low-Rank Optimization With Convex Constraints , 2016, IEEE Transactions on Automatic Control.
[60] Jiashi Feng,et al. Empirical Risk Landscape Analysis for Understanding Deep Neural Networks , 2018, ICLR.
[61] Shai Shalev-Shwartz,et al. SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.
[62] Jason D. Lee,et al. On the Power of Over-parametrization in Neural Networks with Quadratic Activation , 2018, ICML.
[63] Pengtao Xie,et al. Stackelberg GAN: Towards Provable Minimax Equilibrium via Multi-Generator Architectures , 2018, ArXiv.
[64] Sanjeev Arora,et al. On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.
[65] Yann LeCun,et al. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.
[66] Michael I. Jordan,et al. Theoretically Principled Trade-off between Robustness and Accuracy , 2019, ICML.
[67] Mengdi Wang,et al. Blessing of massive scale: spatial graphical model estimation with a total cardinality constraint approach , 2018, Math. Program..
[68] Adel Javanmard,et al. Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.