On ADMM in Deep Learning: Convergence and Saturation-Avoidance
暂无分享,去创建一个
[1] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[2] Liwei Wang,et al. Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.
[3] Adrian S. Lewis,et al. The [barred L]ojasiewicz Inequality for Nonsmooth Subanalytic Functions with Applications to Subgradient Dynamical Systems , 2006, SIAM J. Optim..
[4] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .
[5] Yann LeCun,et al. A theoretical framework for back-propagation , 1988 .
[6] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[7] Charles K. Chui,et al. Construction of Neural Networks for Realization of Localized Deep Learning , 2018, Front. Appl. Math. Stat..
[8] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..
[9] Ding-Xuan Zhou,et al. Universality of Deep Convolutional Neural Networks , 2018, Applied and Computational Harmonic Analysis.
[10] Wenbo Gao,et al. ADMM for multiaffine constrained optimization , 2018, Optim. Methods Softw..
[11] David J. Kriegman,et al. Acquiring linear subspaces for face recognition under variable lighting , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[12] David Rolnick,et al. How to Start Training: The Effect of Initialization and Architecture , 2018, NeurIPS.
[13] Shao-Bo Lin,et al. Generalization and Expressivity for Deep Nets , 2018, IEEE Transactions on Neural Networks and Learning Systems.
[14] Sebastian Ruder,et al. An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.
[15] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[16] Min Zhang,et al. Fully-Corrective Gradient Boosting with Squared Hinge: Fast Learning Rates and Early Stopping , 2020, Neural Networks.
[17] H. N. Mhaskar,et al. Neural Networks for Optimal Approximation of Smooth and Analytic Functions , 1996, Neural Computation.
[18] Sanjiv Kumar,et al. On the Convergence of Adam and Beyond , 2018 .
[19] Xiaoqin Zhang,et al. Constructive Neural Network Learning , 2016, IEEE Transactions on Cybernetics.
[20] Laurent El Ghaoui,et al. Fenchel Lifted Networks: A Lagrange Relaxation of Neural Network Training , 2018, AISTATS.
[21] Quanquan Gu,et al. An Improved Analysis of Training Over-parameterized Deep Neural Networks , 2019, NeurIPS.
[22] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..
[23] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[24] Richard Socher,et al. A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation , 2018, ICLR.
[25] Christoph Schwab,et al. Deep learning in high dimension: Neural network expression rates for generalized polynomial chaos expansions in UQ , 2018, Analysis and Applications.
[26] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[27] Charles K. Chui,et al. Realization of Spatial Sparseness by Deep ReLU Nets With Massive Data , 2019, IEEE Transactions on Neural Networks and Learning Systems.
[28] Miguel Á. Carreira-Perpiñán,et al. Distributed optimization of deeply nested systems , 2012, AISTATS.
[29] Charles K. Chui,et al. Deep Neural Networks for Rotation-Invariance Approximation and Learning , 2019, Analysis and Applications.
[30] Richard M. Schwartz,et al. Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.
[31] Wai Keung Wong,et al. Low-rank discriminative regression learning for image classification , 2020, Neural Networks.
[32] K. Kurdyka. On gradients of functions definable in o-minimal structures , 1998 .
[33] Ohad Shamir,et al. The Power of Depth for Feedforward Neural Networks , 2015, COLT.
[34] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[35] Lei Shi,et al. Realizing Data Features by Deep Nets , 2019, IEEE Transactions on Neural Networks and Learning Systems.
[36] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[37] Yuan Yao,et al. A Proximal Block Coordinate Descent Algorithm for Deep Neural Network Training , 2018, ICLR.
[38] Ding-Xuan Zhou. Deep distributed convolutional neural networks: Universality , 2018, Analysis and Applications.
[39] Zhi-Quan Luo,et al. Convergence analysis of alternating direction method of multipliers for a family of nonconvex problems , 2014, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[40] Tim Salimans,et al. Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks , 2016, NIPS.
[41] Ohad Shamir,et al. Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks , 2016, ICML.
[42] Ziming Zhang,et al. Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks , 2017, NIPS.
[43] Benar Fux Svaiter,et al. Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods , 2013, Math. Program..
[44] Graham J. Williams,et al. Big Data Opportunities and Challenges: Discussions from Data Analytics Perspectives [Discussion Forum] , 2014, IEEE Computational Intelligence Magazine.
[45] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[46] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[47] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[48] B. Mordukhovich. Variational analysis and generalized differentiation , 2006 .
[49] Tara N. Sainath,et al. Deep convolutional neural networks for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[50] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[51] Georg Heigold,et al. An empirical study of learning rates in deep neural networks for speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[52] Philipp Petersen,et al. Optimal approximation of piecewise smooth functions using deep ReLU neural networks , 2017, Neural Networks.
[53] Martin Jaggi,et al. Decoupling Backpropagation using Constrained Optimization Methods , 2018 .
[54] Christian Gagné,et al. Alternating Direction Method of Multipliers for Sparse Convolutional Neural Networks , 2016, ArXiv.
[55] Zhi Han,et al. Depth Selection for Deep ReLU Nets in Feature Extraction and Generalization , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[56] Simon Lucey,et al. Deep Component Analysis via Alternating Direction Neural Networks , 2018, ECCV.
[57] Yuanzhi Li,et al. A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.
[58] H. Mhaskar,et al. Neural networks for localized approximation , 1994 .
[59] Allan Pinkus,et al. Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.
[60] Sebastian Nowozin,et al. Learning Step Size Controllers for Robust Neural Network Training , 2016, AAAI.
[61] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[62] Yoshua Bengio,et al. Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.
[63] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).