暂无分享,去创建一个
[1] A. d'Aspremont,et al. An Approximate Shapley-Folkman Theorem , 2017, 1712.08559.
[2] Yann LeCun,et al. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.
[3] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Stephen P. Boyd,et al. Bounding duality gap for separable problems with linear constraints , 2014, Comput. Optim. Appl..
[5] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[6] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[7] R. Srikant,et al. Adding One Neuron Can Eliminate All Bad Local Minima , 2018, NeurIPS.
[8] René Vidal,et al. Global Optimality in Neural Network Training , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Ohad Shamir,et al. Failures of Gradient-Based Deep Learning , 2017, ICML.
[10] Sanjeev Arora,et al. On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.
[11] David P. Woodruff,et al. Matrix Completion and Related Problems via Strong Duality , 2017, ITCS.
[12] Hao Li,et al. Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.
[13] Tengyu Ma,et al. Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.
[14] Pierre Baldi,et al. Complex-Valued Autoencoders , 2011, Neural Networks.
[15] Hongyang Zhang,et al. Algorithmic Regularization in Over-parameterized Matrix Sensing and Neural Networks with Quadratic Activations , 2017, COLT.
[16] D. Bertsekas,et al. Estimates of the duality gap for large-scale separable nonconvex optimization problems , 1982, 1982 21st IEEE Conference on Decision and Control.
[17] Haihao Lu,et al. Depth Creates No Bad Local Minima , 2017, ArXiv.
[18] Yonina C. Eldar,et al. Strong Duality in Nonconvex Quadratic Optimization with Two Quadratic Constraints , 2006, SIAM J. Optim..
[19] Hava T. Siegelmann,et al. On the complexity of training neural networks with continuous activation functions , 1995, IEEE Trans. Neural Networks.
[20] Tengyu Ma,et al. Identity Matters in Deep Learning , 2016, ICLR.
[21] Thomas Laurent,et al. The Multilinear Structure of ReLU Networks , 2017, ICML.
[22] Daniel Soudry,et al. No bad local minima: Data independent training error guarantees for multilayer neural networks , 2016, ArXiv.
[23] Joan Bruna,et al. Mathematics of Deep Learning , 2017, ArXiv.
[24] Serge J. Belongie,et al. Residual Networks Behave Like Ensembles of Relatively Shallow Networks , 2016, NIPS.
[25] Ruslan Salakhutdinov,et al. Path-SGD: Path-Normalized Optimization in Deep Neural Networks , 2015, NIPS.
[26] Xavier Gastaldi,et al. Shake-Shake regularization , 2017, ArXiv.
[27] Shai Shalev-Shwartz,et al. SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.
[28] Ohad Shamir,et al. Spurious Local Minima are Common in Two-Layer ReLU Neural Networks , 2017, ICML.
[29] Shai Ben-David,et al. Hardness Results for Neural Network Approximation Problems , 1999, EuroCOLT.
[30] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[31] Bingsheng He,et al. On the O(1/n) Convergence Rate of the Douglas-Rachford Alternating Direction Method , 2012, SIAM J. Numer. Anal..
[32] Jason D. Lee,et al. On the Power of Over-parametrization in Neural Networks with Quadratic Activation , 2018, ICML.
[33] Nicolas Le Roux,et al. Convex Neural Networks , 2005, NIPS.
[34] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[35] Kurt Hornik,et al. Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.
[36] Ronald L. Rivest,et al. Training a 3-node neural network is NP-complete , 1988, COLT '88.
[37] Francis R. Bach,et al. Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..
[38] Maria-Florina Balcan,et al. Learning and 1-bit Compressed Sensing under Asymmetric Noise , 2016, COLT.
[39] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[40] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[41] Stephen P. Boyd,et al. Generalized Low Rank Models , 2014, Found. Trends Mach. Learn..
[42] Le Song,et al. Diverse Neural Network Learns True Target Functions , 2016, AISTATS.
[43] Yann LeCun,et al. Open Problem: The landscape of the loss surfaces of multilayer networks , 2015, COLT.
[44] Adel Javanmard,et al. Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.
[45] Anders Rantzer,et al. Low-Rank Optimization With Convex Constraints , 2016, IEEE Transactions on Automatic Control.
[46] A. Tang,et al. Refined Shapely-Folkman Lemma and Its Application in Duality Gap Estimation , 2016, 1610.05416.
[47] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.
[48] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[49] Anima Anandkumar,et al. Efficient approaches for escaping higher order saddle points in non-convex optimization , 2016, COLT.
[50] Mengdi Wang,et al. Blessing of massive scale: spatial graphical model estimation with a total cardinality constraint approach , 2018, Math. Program..
[51] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[52] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[53] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.
[54] Jiashi Feng,et al. Empirical Risk Landscape Analysis for Understanding Deep Neural Networks , 2018, ICLR.
[55] Xinhua Zhang,et al. Convex Deep Learning via Normalized Kernels , 2014, NIPS.