Global Optimality Beyond Two Layers: Training Deep ReLU Networks via Convex Programs
暂无分享,去创建一个
[1] Morteza Mardani,et al. Demystifying Batch Normalization in ReLU Networks: Equivalent Convex Optimization Models and Implicit Regularization , 2021, ICLR.
[2] Ji Zhu,et al. l1 Regularization in Infinite Dimensional Feature Spaces , 2007, COLT.
[3] René Vidal,et al. Global Optimality in Neural Network Training , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Sanjeev Arora,et al. On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.
[5] Yann LeCun,et al. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.
[6] P. Bartlett,et al. Hardness results for neural network approximation problems , 1999, Theor. Comput. Sci..
[7] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.
[8] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[9] Mert Pilanci,et al. Convex Optimization for Shallow Neural Networks , 2019, 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton).
[10] Stephen Boyd,et al. A Rewriting System for Convex Optimization Problems , 2017, ArXiv.
[11] Mert Pilanci,et al. Convex Neural Autoregressive Models: Towards Tractable, Expressive, and Theoretically-Backed Models for Sequential Forecasting and Generation , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Ohad Shamir,et al. Spurious Local Minima are Common in Two-Layer ReLU Neural Networks , 2017, ICML.
[13] Ruslan Salakhutdinov,et al. Deep Neural Networks with Multi-Branch Architectures Are Intrinsically Less Non-Convex , 2019, AISTATS.
[14] Ronald L. Rivest,et al. Training a 3-node neural network is NP-complete , 1988, COLT '88.
[15] M. Sion. On general minimax theorems , 1958 .
[16] Mert Pilanci,et al. Convex Geometry of Two-Layer ReLU Networks: Implicit Autoencoding and Interpretable Models , 2020, AISTATS.
[17] Mert Pilanci,et al. Convex Duality of Deep Neural Networks , 2020, ArXiv.
[18] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.
[19] Tengyu Ma,et al. Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.
[20] Mert Pilanci,et al. Implicit Convex Regularizers of CNN Architectures: Convex Optimization of Two- and Three-Layer Networks in Polynomial Time , 2021, ICLR.
[21] J. Czerniak,et al. Application of rough sets in the presumptive diagnosis of urinary system diseases , 2003 .
[22] Stephen P. Boyd,et al. CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..
[23] Piyush C. Ojha,et al. Enumeration of linear threshold functions from the lattice of hyperplane intersections , 2000, IEEE Trans. Neural Networks Learn. Syst..
[24] Anima Anandkumar,et al. Efficient approaches for escaping higher order saddle points in non-convex optimization , 2016, COLT.
[25] Francis R. Bach,et al. Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..
[26] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Kim-Chuan Toh,et al. SDPT3 — a Matlab software package for semidefinite-quadratic-linear programming, version 3.0 , 2001 .
[29] Shai Shalev-Shwartz,et al. SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.
[30] Serge J. Belongie,et al. Residual Networks Behave Like Ensembles of Relatively Shallow Networks , 2016, NIPS.
[31] Hava T. Siegelmann,et al. On the complexity of training neural networks with continuous activation functions , 1995, IEEE Trans. Neural Networks.
[32] Morteza Mardani,et al. Hidden Convexity of Wasserstein GANs: Interpretable Generative Models with Closed-Form Solutions , 2021, ArXiv.
[33] Ohad Shamir,et al. Failures of Gradient-Based Deep Learning , 2017, ICML.
[34] Thomas M. Cover,et al. Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..
[35] Mert Pilanci,et al. Convex Duality and Cutting Plane Methods for Over-parameterized Neural Networks , 2019 .
[36] Mert Pilanci,et al. Vector-output ReLU Neural Network Problems are Copositive Programs: Convex Analysis of Two Layer Networks and Polynomial-time Algorithms , 2020, ICLR.
[37] R. Stanley. An Introduction to Hyperplane Arrangements , 2007 .
[38] Amir Globerson,et al. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.
[39] R. Winder. Partitions of N-Space by Hyperplanes , 1966 .
[40] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.
[41] Mert Pilanci,et al. Convex Geometry and Duality of Over-parameterized Neural Networks , 2020, J. Mach. Learn. Res..
[42] Mert Pilanci,et al. Neural Networks are Convex Regularizers: Exact Polynomial-time Convex Optimization Formulations for Two-Layer Networks , 2020, ICML.
[43] Jason D. Lee,et al. On the Power of Over-parametrization in Neural Networks with Quadratic Activation , 2018, ICML.
[44] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Mert Pilanci,et al. Convex Programs for Global Optimization of Convolutional Neural Networks in Polynomial-Time , 2020 .
[46] Nathan Srebro,et al. How do infinite width bounded norm networks look in function space? , 2019, COLT.