Improved Linear Convergence of Training CNNs With Generalizability Guarantees: A One-Hidden-Layer Case
暂无分享,去创建一个
Jinjun Xiong | Meng Wang | Pin-Yu Chen | Sijia Liu | Shuai Zhang | Pin-Yu Chen | Sijia Liu | Jinjun Xiong | Shuai Zhang | Meng Wang
[1] Yuanzhi Li,et al. Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.
[2] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[3] R. Srikant,et al. Adding One Neuron Can Eliminate All Bad Local Minima , 2018, NeurIPS.
[4] Xiuping Jia,et al. Deep Feature Extraction and Classification of Hyperspectral Images Based on Convolutional Neural Networks , 2016, IEEE Transactions on Geoscience and Remote Sensing.
[5] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[6] Xiao Zhang,et al. Learning One-hidden-layer ReLU Networks via Gradient Descent , 2018, AISTATS.
[7] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[8] Gang Wang,et al. Learning ReLU Networks on Linearly Separable Data: Algorithm, Optimality, and Generalization , 2018, IEEE Transactions on Signal Processing.
[9] Inderjit S. Dhillon,et al. Learning Non-overlapping Convolutional Neural Networks with Multiple Kernels , 2017, ArXiv.
[10] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[11] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[12] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[13] Yingbin Liang,et al. Guaranteed Recovery of One-Hidden-Layer Neural Networks via Cross Entropy , 2018, IEEE Transactions on Signal Processing.
[14] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[15] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .
[16] Alexandros G. Dimakis,et al. Learning Distributions Generated by One-Layer ReLU Networks , 2019, NeurIPS.
[17] Ah Chung Tsoi,et al. Face recognition: a convolutional neural-network approach , 1997, IEEE Trans. Neural Networks.
[18] Adel Javanmard,et al. Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.
[19] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[20] Yuandong Tian,et al. When is a Convolutional Filter Easy To Learn? , 2017, ICLR.
[21] Ohad Shamir,et al. Spurious Local Minima are Common in Two-Layer ReLU Neural Networks , 2017, ICML.
[22] Amir Globerson,et al. Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.
[23] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..
[24] Yuandong Tian,et al. Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima , 2017, ICML.
[25] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[26] Jason Weston,et al. A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.
[27] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[28] O. Nelles,et al. An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.
[29] Inderjit S. Dhillon,et al. Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.
[30] Yuandong Tian,et al. An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis , 2017, ICML.
[31] Joel A. Tropp,et al. User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..
[32] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.
[33] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.
[34] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[35] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.
[36] Tengyu Ma,et al. Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.
[37] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[38] Tianbao Yang,et al. Unified Convergence Analysis of Stochastic Momentum Methods for Convex and Non-convex Optimization , 2016, 1604.03257.
[39] Thomas Laurent,et al. The Multilinear Structure of ReLU Networks , 2017, ICML.
[40] H. Sebastian Seung,et al. Permitted and Forbidden Sets in Symmetric Threshold-Linear Networks , 2003, Neural Computation.
[41] Roi Livni,et al. On the Computational Efficiency of Training Neural Networks , 2014, NIPS.
[42] Raghu Meka,et al. Learning One Convolutional Layer with Overlapping Patches , 2018, ICML.
[43] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[44] Gilad Yehudai,et al. On the Power and Limitations of Random Features for Understanding Neural Networks , 2019, NeurIPS.
[45] Percy Liang,et al. Tensor Factorization via Matrix Factorization , 2015, AISTATS.
[46] Ohad Shamir,et al. Distribution-Specific Hardness of Learning Neural Networks , 2016, J. Mach. Learn. Res..