DYNAMICAL, SYMPLECTIC AND STOCHASTIC PERSPECTIVES ON GRADIENT-BASED OPTIMIZATION
暂无分享,去创建一个
[1] V. G. Troitsky,et al. Journal of Mathematical Analysis and Applications , 1960 .
[2] Boris Polyak. Gradient methods for the minimisation of functionals , 1963 .
[3] O. Mangasarian. PSEUDO-CONVEX FUNCTIONS , 1965 .
[4] Kazuoki Azuma. WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .
[5] Richard A. Harshman,et al. Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .
[6] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[7] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .
[8] Mihalis Yannakakis,et al. How easy is local search? , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).
[9] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[10] Jean-Francois Cardoso,et al. Source separation using higher order moments , 1989, International Conference on Acoustics, Speech, and Signal Processing,.
[11] Kurt Hornik,et al. Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.
[12] S. Mitter,et al. Recursive stochastic algorithms for global optimization in R d , 1991 .
[13] P. Pardalos. Complexity in numerical optimization , 1993 .
[14] Saad,et al. On-line learning in soft committee machines. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.
[15] Alan M. Frieze,et al. Learning linear transformations , 1996, Proceedings of 37th Conference on Foundations of Computer Science.
[16] David J. Field,et al. Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.
[17] Magnus Rattray,et al. Natural gradient descent for on-line learning , 1998 .
[18] M. A. Hanson. Invexity and the Kuhn–Tucker Theorem☆ , 1999 .
[19] Aapo Hyvärinen. Fast ICA for noisy data using Gaussian moments , 1999, ISCAS.
[20] Krzysztof C. Kiwiel,et al. Convergence and efficiency of subgradient methods for quasiconvex minimization , 2001, Math. Program..
[21] Tamara G. Kolda,et al. Orthogonal Tensor Decompositions , 2000, SIAM J. Matrix Anal. Appl..
[22] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..
[23] Hyeyoung Park,et al. On-Line Learning Theory of Soft Committee Machines with Correlated Hidden Units : Steepest Gradient Descent and Natural Gradient Descent , 2002, cond-mat/0212006.
[24] E. Hairer,et al. Geometric Numerical Integration: Structure Preserving Algorithms for Ordinary Differential Equations , 2004 .
[25] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[26] Yurii Nesterov,et al. Cubic regularization of Newton method and its global performance , 2006, Math. Program..
[27] Yan V Fyodorov,et al. Replica Symmetry Breaking Condition Exposed by Random Matrix Calculation of Landscape Complexity , 2007, cond-mat/0702601.
[28] A. Bray,et al. Statistics of critical points of Gaussian fields on large-dimensional spaces. , 2006, Physical review letters.
[29] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..
[30] P. Comon,et al. Tensor decompositions, alternating least squares and other tales , 2009 .
[31] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..
[32] Ohad Shamir,et al. Stochastic Convex Optimization , 2009, COLT.
[33] Vladimír Lacko,et al. On decompositional algorithms for uniform sampling from n-spheres and n-balls , 2010, J. Multivar. Anal..
[34] Nicholas I. M. Gould,et al. On the Complexity of Steepest Descent, Newton's and Regularized Newton's Methods for Nonconvex Unconstrained Optimization Problems , 2010, SIAM J. Optim..
[35] Martin J. Wainwright,et al. Fast global convergence rates of gradient methods for high-dimensional statistical recovery , 2010, NIPS.
[36] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[37] Emmanuel J. Candès,et al. Tight Oracle Inequalities for Low-Rank Matrix Recovery From a Minimal Number of Noisy Random Measurements , 2011, IEEE Transactions on Information Theory.
[38] Yurii Nesterov,et al. Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..
[39] Seungjin Choi,et al. Independent Component Analysis , 2009, Handbook of Natural Computing.
[40] Ohad Shamir,et al. Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization , 2011, ICML.
[41] Anima Anandkumar,et al. Fast Detection of Overlapping Communities via Online Tensor Methods on GPUs , 2013, ArXiv.
[42] Sham M. Kakade,et al. Learning mixtures of spherical gaussians: moment methods and spectral decompositions , 2012, ITCS '13.
[43] Yin Tat Lee,et al. Efficient Accelerated Coordinate Descent Methods and Faster Algorithms for Solving Linear Systems , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.
[44] Ryan P. Adams,et al. Contrastive Learning Using Spectral Methods , 2013, NIPS.
[45] Prateek Jain,et al. Low-rank matrix completion using alternating minimization , 2012, STOC '13.
[46] M. Betancourt,et al. The Geometric Foundations of Hamiltonian Monte Carlo , 2014, 1410.5110.
[47] Yann LeCun,et al. The Loss Surface of Multilayer Networks , 2014, ArXiv.
[48] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[49] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[50] Anima Anandkumar,et al. Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..
[51] A. Dalalyan. Theoretical guarantees for approximate sampling from smooth and log‐concave densities , 2014, 1412.7392.
[52] Prateek Jain,et al. Computing Matrix Squareroot via Non Convex Local Search , 2015, ArXiv.
[53] Mohit Singh,et al. A geometric alternative to Nesterov's accelerated gradient descent , 2015, ArXiv.
[54] Sébastien Bubeck,et al. Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..
[55] Sanjeev Arora,et al. Provable ICA with Unknown Gaussian Noise, and Implications for Gaussian Mixtures and Autoencoders , 2012, Algorithmica.
[56] Alexandre M. Bayen,et al. Accelerated Mirror Descent in Continuous and Discrete Time , 2015, NIPS.
[57] Zhi-Quan Luo,et al. Guaranteed Matrix Completion via Non-Convex Factorization , 2014, IEEE Transactions on Information Theory.
[58] Prateek Jain,et al. Phase Retrieval Using Alternating Minimization , 2013, IEEE Transactions on Signal Processing.
[59] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[60] Xiaodong Li,et al. Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.
[61] John Wright,et al. A Geometric Analysis of Phase Retrieval , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).
[62] Kenji Kawaguchi,et al. Deep Learning without Poor Local Minima , 2016, NIPS.
[63] Mark W. Schmidt,et al. Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.
[64] Saeed Ghadimi,et al. Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.
[65] Nicolas Boumal,et al. The non-convex Burer-Monteiro approach works on smooth semidefinite programs , 2016, NIPS.
[66] John D. Lafferty,et al. Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro Factorization and Gradient Descent , 2016, ArXiv.
[67] Nathan Srebro,et al. Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.
[68] Stephen P. Boyd,et al. A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..
[69] Andre Wibisono,et al. A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.
[70] Nicolas Boumal,et al. On the low-rank approach for semidefinite programs arising in synchronization and community detection , 2016, COLT.
[71] Yair Carmon,et al. Accelerated Methods for Non-Convex Optimization , 2016, SIAM J. Optim..
[72] Kfir Y. Levy,et al. The Power of Normalization: Faster Evasion of Saddle Points , 2016, ArXiv.
[73] Yair Carmon,et al. Gradient Descent Efficiently Finds the Cubic-Regularized Non-Convex Newton Step , 2016, ArXiv.
[74] É. Moulines,et al. Sampling from a strongly log-concave distribution with the Unadjusted Langevin Algorithm , 2016 .
[75] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.
[76] Tengyu Ma,et al. Matrix Completion has No Spurious Local Minimum , 2016, NIPS.
[77] Zeyuan Allen Zhu,et al. Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.
[78] Alexander J. Smola,et al. Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.
[79] Michael I. Jordan,et al. A Lyapunov Analysis of Momentum Methods in Optimization , 2016, ArXiv.
[80] Tong Zhang,et al. Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.
[81] Michael I. Jordan,et al. Gradient Descent Can Take Exponential Time to Escape Saddle Points , 2017, NIPS.
[82] Yi Zheng,et al. No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.
[83] John Wright,et al. Complete Dictionary Recovery Over the Sphere I: Overview and the Geometric Picture , 2015, IEEE Transactions on Information Theory.
[84] Yair Carmon,et al. "Convex Until Proven Guilty": Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions , 2017, ICML.
[85] Prateek Jain,et al. Accelerating Stochastic Gradient Descent , 2017, ArXiv.
[86] Michael O'Neill,et al. Behavior of Accelerated Gradient Methods Near Critical Points of Nonconvex Problems , 2017 .
[87] Zeyuan Allen Zhu,et al. Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.
[88] Andrea Montanari,et al. Solving SDPs for synchronization and MaxCut problems via the Grothendieck inequality , 2017, COLT.
[89] Anastasios Kyrillidis,et al. Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach , 2016, AISTATS.
[90] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.
[91] Tengyu Ma,et al. Finding approximate local minima faster than gradient descent , 2016, STOC.
[92] Michael I. Jordan,et al. Accelerated Gradient Descent Escapes Saddle Points Faster than Gradient Descent , 2017, COLT.
[93] Michael I. Jordan,et al. Underdamped Langevin MCMC: A non-asymptotic analysis , 2017, COLT.
[94] Stephen J. Wright,et al. Complexity Analysis of Second-Order Line-Search Algorithms for Smooth Nonconvex Optimization , 2017, SIAM J. Optim..
[95] Michael I. Jordan,et al. On Symplectic Optimization , 2018, 1802.03653.
[96] Yurii Nesterov,et al. Linear convergence of first order methods for non-strongly convex optimization , 2015, Math. Program..
[97] Huan Li,et al. Provable accelerated gradient method for nonconvex low rank optimization , 2017, Machine Learning.
[98] Geelon So ags. Tensor Decompositions , 2021, Matrix and Tensor Decompositions in Signal Processing.