论文信息 - Generalized Majorization-Minimization for Non-Convex Optimization

Generalized Majorization-Minimization for Non-Convex Optimization

Majorization-Minimization (MM) algorithms optimize an objective function by iteratively minimizing its majorizing surrogate and offer attractively fast convergence rate for convex problems. However, their convergence behaviors for non-convex problems remain unclear. In this paper, we propose a novel MM surrogate function from strictly upper bounding the objective to bounding the objective in expectation. With this generalized surrogate conception, we develop a new optimization algorithm, termed SPI-MM, that leverages the recent proposed SPIDER for more efficient non-convex optimization. We prove that for finite-sum problems, the SPI-MM algorithm converges to an stationary point within deterministic and lower stochastic gradient complexity. To our best knowledge, this work gives the first non-asymptotic convergence analysis for MM-alike algorithms in general non-convex optimization. Extensive empirical studies on nonconvex logistic regression and sparse PCA demonstrate the advantageous efficiency of the proposed algorithm and validate our theoretical results.

[1] Nicholas I. M. Gould,et al. SIAM Journal on Optimization , 2012 .

[2] Julien Mairal,et al. Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization , 2013, NIPS.

[3] J. Borwein,et al. Convex Analysis And Nonlinear Optimization , 2000 .

[4] Jie Liu,et al. SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[5] Zhihua Zhang,et al. On the Global Convergence of Majorization Minimization Algorithms for Nonconvex Optimization Problems , 2015, ArXiv.

[6] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .

[7] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[8] N. Jojic,et al. Ieee Transactions on Signal Processing: Supplement on Secure Media 1 Facecerts Ieee Transactions on Signal Processing: Supplement on Secure Media 2 , 2003 .

[9] R. Maitra,et al. Supplement to “ A k-mean-directions Algorithm for Fast Clustering of Data on the Sphere ” published in the Journal of Computational and Graphical Statistics , 2009 .

[10] Julien Mairal,et al. Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure , 2016, NIPS.

[11] Kun He,et al. Generalized Majorization-Minimization , 2015, ICML.

[12] F. Vaida. PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS , 2005 .

[13] Hongbin Zha,et al. Relaxed Majorization-Minimization for Non-Smooth and Non-Convex Optimization , 2015, AAAI.

[14] Yi Zhou,et al. SpiderBoost: A Class of Faster Variance-reduced Algorithms for Nonconvex Optimization , 2018, ArXiv.

[15] Léon Bottou,et al. A Lower Bound for the Optimization of Finite Sums , 2014, ICML.

[16] Julien Mairal,et al. Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning , 2014, SIAM J. Optim..

[17] R. Rosenfeld. Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[18] Tong Zhang,et al. SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[19] RazaviyaynMeisam,et al. A Stochastic Successive Minimization Method for Nonsmooth Nonconvex Optimization with Applications to Transceiver Design in Wireless Communication Networks , 2016 .

[20] Dinh Phung,et al. Journal of Machine Learning Research: Preface , 2014 .

[21] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[22] Julien Mairal,et al. Optimization with First-Order Surrogate Functions , 2013, ICML.

[23] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24] O. Cappé,et al. On‐line expectation–maximization algorithm for latent data models , 2009 .

[25] D. Hunter,et al. Optimization Transfer Using Surrogate Objective Functions , 2000 .

[26] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[27] Shuicheng Yan,et al. Online Robust PCA via Stochastic Optimization , 2013, NIPS.

[28] Artin,et al. SARAH : A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017 .

[29] Stéphane Canu,et al. Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming , 2009, IEEE Transactions on Signal Processing.

[30] Zhi-Quan Luo,et al. A Stochastic Successive Minimization Method for Nonsmooth Nonconvex Optimization with Applications to Transceiver Design in Wireless Communication Networks , 2013, Mathematical Programming.