Stochastic regularized majorization-minimization with weakly convex and multi-convex surrogates
暂无分享,去创建一个
[1] Hanbaek Lyu. Convergence of block coordinate descent with diminishing radius for nonconvex optimization , 2020 .
[2] D. Needell,et al. Online Nonnegative CP-dictionary Learning for Markovian Data , 2020, J. Mach. Learn. Res..
[3] D. Needell,et al. Online matrix factorization for Markovian data and applications to Network Dictionary Learning , 2019, ArXiv.
[4] Tamara G. Kolda,et al. Stochastic Gradients for Large-Scale Tensor Decomposition , 2019, SIAM J. Math. Data Sci..
[5] Rong Jin,et al. On the Convergence of (Stochastic) Gradient Descent with Extrapolation for Non-Convex Minimization , 2019, IJCAI.
[6] Wotao Yin,et al. Markov chain block coordinate descent , 2018, Computational Optimization and Applications.
[7] Wotao Yin,et al. On Markov Chain Gradient Descent , 2018, NeurIPS.
[8] Xiaoxia Wu,et al. AdaGrad stepsizes: Sharp convergence over nonconvex landscapes, from any initialization , 2018, ICML.
[9] Dmitriy Drusvyatskiy,et al. Stochastic model-based minimization of weakly convex functions , 2018, SIAM J. Optim..
[10] Amir Beck,et al. First-Order Methods in Optimization , 2017 .
[11] Gaël Varoquaux,et al. Stochastic Subsampling for Factorizing Huge Matrices , 2017, IEEE Transactions on Signal Processing.
[12] Xiao Zhang,et al. A Unified Computational and Statistical Framework for Nonconvex Low-rank Matrix Estimation , 2016, AISTATS.
[13] Vincent Yan Fu Tan,et al. Online Nonnegative Matrix Factorization with General Divergences , 2016, AISTATS.
[14] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[15] Ya-Xiang Yuan,et al. Recent advances in trust region algorithms , 2015, Mathematical Programming.
[16] Stephen J. Wright. Coordinate descent algorithms , 2015, Mathematical Programming.
[17] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[18] Stephen P. Boyd,et al. Proximal Algorithms , 2013, Found. Trends Optim..
[19] Wotao Yin,et al. A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion , 2013, SIAM J. Imaging Sci..
[20] Saeed Ghadimi,et al. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..
[21] Julien Mairal,et al. Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization , 2013, NIPS.
[22] Julien Mairal,et al. Optimization with First-Order Surrogate Functions , 2013, ICML.
[23] Yurii Nesterov,et al. Gradient methods for minimizing composite functions , 2012, Mathematical Programming.
[24] Andrew Gelman,et al. Handbook of Markov Chain Monte Carlo , 2011 .
[25] Yoram Singer,et al. Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..
[26] Guillermo Sapiro,et al. Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..
[27] Mikael Johansson,et al. A Randomized Incremental Subgradient Method for Distributed Optimization in Networked Systems , 2009, SIAM J. Optim..
[28] O. Cappé,et al. On‐line expectation–maximization algorithm for latent data models , 2009 .
[29] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[30] Léon Bottou,et al. The Tradeoffs of Large Scale Learning , 2007, NIPS.
[31] Mikael Johansson,et al. A simple peer-to-peer algorithm for distributed optimization in sensor networks , 2007, 2007 46th IEEE Conference on Decision and Control.
[32] R. Tibshirani,et al. Least angle regression , 2004, math/0406456.
[33] L. Stefanski,et al. The Calculus of M-Estimation , 2002 .
[34] S. R. Jammalamadaka,et al. Empirical Processes in M-Estimation , 2001 .
[35] Luigi Grippo,et al. On the convergence of the block nonlinear Gauss-Seidel method under convex constraints , 2000, Oper. Res. Lett..
[36] D. Hunter,et al. Optimization Transfer Using Surrogate Objective Functions , 2000 .
[37] R. Horst,et al. DC Programming: Overview , 1999 .
[38] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.
[39] Krzysztof J. Cios,et al. Advances in neural information processing systems 7 , 1997 .
[40] C. Geyer. On the Asymptotics of Constrained $M$-Estimation , 1994 .
[41] R. Durrett. Probability: Theory and Examples , 1993 .
[42] J. Chang,et al. Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .
[43] L. Tucker,et al. Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.
[44] J. Zou,et al. ON OF STOCHASTIC GRADIENT ILL-POSED , 2020 .
[45] D. Gleich. TRUST REGION METHODS , 2017 .
[46] Richard A. Levine,et al. Journal of Computational and Graphical Statistics , 2014 .
[47] Y. Nesterov. Gradient methods for minimizing composite functions , 2013, Math. Program..
[48] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[49] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..
[50] A. Kleywegt,et al. Stochastic Optimization , 2003 .
[51] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .
[52] Richard L. Tweedie,et al. Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.
[53] Richard A. Harshman,et al. Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .
[54] K. Schittkowski,et al. NONLINEAR PROGRAMMING , 2022 .