Global Convergence of Block Coordinate Descent in Deep Learning
暂无分享,去创建一个
[1] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..
[2] S. Sundararajan,et al. A distributed block coordinate descent method for training $l_1$ regularized linear classifiers , 2014, J. Mach. Learn. Res..
[3] Wotao Yin,et al. Global Convergence of ADMM in Nonconvex Nonsmooth Optimization , 2015, Journal of Scientific Computing.
[4] S. Łojasiewicz. Ensembles semi-analytiques , 1965 .
[5] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[6] M. Shiota. Geometry of subanalytic and semialgebraic sets , 1997 .
[7] Dmitriy Drusvyatskiy,et al. Stochastic Subgradient Method Converges on Tame Functions , 2018, Foundations of Computational Mathematics.
[8] Benar Fux Svaiter,et al. Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods , 2013, Math. Program..
[9] Kunihiko Fukushima,et al. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.
[10] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[11] F. Giannessi. Variational Analysis and Generalized Differentiation , 2006 .
[12] Frank Rosenblatt,et al. PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .
[13] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Hédy Attouch,et al. On the convergence of the proximal algorithm for nonsmooth functions involving analytic features , 2008, Math. Program..
[15] Wotao Yin,et al. A Globally Convergent Algorithm for Nonconvex Optimization Based on Block Coordinate Update , 2014, J. Sci. Comput..
[16] K. Kurdyka. On gradients of functions definable in o-minimal structures , 1998 .
[17] H. Robbins. A Stochastic Approximation Method , 1951 .
[18] Sanjiv Kumar,et al. On the Convergence of Adam and Beyond , 2018 .
[19] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[20] Wotao Yin,et al. A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion , 2013, SIAM J. Imaging Sci..
[21] H. Zou,et al. Regularization and variable selection via the elastic net , 2005 .
[22] Venkatesh Saligrama,et al. Efficient Training of Very Deep Neural Networks for Supervised Hashing , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Yuan Yao,et al. A Proximal Block Coordinate Descent Algorithm for Deep Neural Network Training , 2018, ICLR.
[24] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[25] Ziming Zhang,et al. Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks , 2017, NIPS.
[26] Adrian S. Lewis,et al. Clarke Subgradients of Stratifiable Functions , 2006, SIAM J. Optim..
[27] Bastian Goldlücke,et al. Variational Analysis , 2014, Computer Vision, A Reference Guide.
[28] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.
[29] Richard M. Schwartz,et al. Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.
[30] Chun-Xia Zhang,et al. A sparse-response deep belief network based on rate distortion theory , 2014, Pattern Recognit..
[31] B. Mordukhovich. Variational analysis and generalized differentiation , 2006 .
[32] Harold R. Parks,et al. A Primer of Real Analytic Functions , 1992 .
[33] Miguel Á. Carreira-Perpiñán,et al. Distributed optimization of deeply nested systems , 2012, AISTATS.
[34] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[35] Quoc V. Le,et al. On optimization methods for deep learning , 2011, ICML.
[36] Marc Teboulle,et al. Proximal alternating linearized minimization for nonconvex and nonsmooth problems , 2013, Mathematical Programming.
[37] Yurii Nesterov,et al. Lectures on Convex Optimization , 2018 .
[38] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[39] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .
[40] Hédy Attouch,et al. Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality , 2008, Math. Oper. Res..
[41] S. Łojasiewicz. Sur la géométrie semi- et sous- analytique , 1993 .
[42] Zheng Xu,et al. Training Neural Networks Without Gradients: A Scalable ADMM Approach , 2016, ICML.
[43] Tomaso A. Poggio,et al. Theory II: Landscape of the Empirical Risk in Deep Learning , 2017, ArXiv.
[44] Adrian S. Lewis,et al. The [barred L]ojasiewicz Inequality for Nonsmooth Subanalytic Functions with Applications to Subgradient Dynamical Systems , 2006, SIAM J. Optim..
[45] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[46] Laurent El Ghaoui,et al. Fenchel Lifted Networks: A Lagrange Relaxation of Neural Network Training , 2018, AISTATS.
[47] Tara N. Sainath,et al. Deep convolutional neural networks for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.