Generalized Majorization-Minimization for Non-Convex Optimization

Majorization-Minimization (MM) algorithms optimize an objective function by iteratively minimizing its majorizing surrogate and offer attractively fast convergence rate for convex problems. However, their convergence behaviors for non-convex problems remain unclear. In this paper, we propose a novel MM surrogate function from strictly upper bounding the objective to bounding the objective in expectation. With this generalized surrogate conception, we develop a new optimization algorithm, termed SPI-MM, that leverages the recent proposed SPIDER for more efficient non-convex optimization. We prove that for finite-sum problems, the SPI-MM algorithm converges to an stationary point within deterministic and lower stochastic gradient complexity. To our best knowledge, this work gives the first non-asymptotic convergence analysis for MM-alike algorithms in general non-convex optimization. Extensive empirical studies on nonconvex logistic regression and sparse PCA demonstrate the advantageous efficiency of the proposed algorithm and validate our theoretical results.

[1]  Nicholas I. M. Gould,et al.  SIAM Journal on Optimization , 2012 .

[2]  Julien Mairal,et al.  Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization , 2013, NIPS.

[3]  J. Borwein,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[4]  Jie Liu,et al.  SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[5]  Zhihua Zhang,et al.  On the Global Convergence of Majorization Minimization Algorithms for Nonconvex Optimization Problems , 2015, ArXiv.

[6]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[7]  Shai Shalev-Shwartz,et al.  Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..

[8]  N. Jojic,et al.  Ieee Transactions on Signal Processing: Supplement on Secure Media 1 Facecerts Ieee Transactions on Signal Processing: Supplement on Secure Media 2 , 2003 .

[9]  R. Maitra,et al.  Supplement to “ A k-mean-directions Algorithm for Fast Clustering of Data on the Sphere ” published in the Journal of Computational and Graphical Statistics , 2009 .

[10]  Julien Mairal,et al.  Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite Sum Structure , 2016, NIPS.

[11]  Kun He,et al.  Generalized Majorization-Minimization , 2015, ICML.

[12]  F. Vaida PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS , 2005 .

[13]  Hongbin Zha,et al.  Relaxed Majorization-Minimization for Non-Smooth and Non-Convex Optimization , 2015, AAAI.

[14]  Yi Zhou,et al.  SpiderBoost: A Class of Faster Variance-reduced Algorithms for Nonconvex Optimization , 2018, ArXiv.

[15]  Léon Bottou,et al.  A Lower Bound for the Optimization of Finite Sums , 2014, ICML.

[16]  Julien Mairal,et al.  Incremental Majorization-Minimization Optimization with Application to Large-Scale Machine Learning , 2014, SIAM J. Optim..

[17]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[18]  Tong Zhang,et al.  SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator , 2018, NeurIPS.

[19]  RazaviyaynMeisam,et al.  A Stochastic Successive Minimization Method for Nonsmooth Nonconvex Optimization with Applications to Transceiver Design in Wireless Communication Networks , 2016 .

[20]  Dinh Phung,et al.  Journal of Machine Learning Research: Preface , 2014 .

[21]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[22]  Julien Mairal,et al.  Optimization with First-Order Surrogate Functions , 2013, ICML.

[23]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[24]  O. Cappé,et al.  On‐line expectation–maximization algorithm for latent data models , 2009 .

[25]  D. Hunter,et al.  Optimization Transfer Using Surrogate Objective Functions , 2000 .

[26]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[27]  Shuicheng Yan,et al.  Online Robust PCA via Stochastic Optimization , 2013, NIPS.

[28]  Artin,et al.  SARAH : A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017 .

[29]  Stéphane Canu,et al.  Recovering Sparse Signals With a Certain Family of Nonconvex Penalties and DC Programming , 2009, IEEE Transactions on Signal Processing.

[30]  Zhi-Quan Luo,et al.  A Stochastic Successive Minimization Method for Nonsmooth Nonconvex Optimization with Applications to Transceiver Design in Wireless Communication Networks , 2013, Mathematical Programming.