GS-OPT: A new fast stochastic algorithm for solving the non-convex optimization problem

Non-convex optimization has an important role in machine learning. However, the theoretical understanding of non-convex optimization remained rather limited. Studying efficient algorithms for non-convex optimization has attracted a great deal of attention from many researchers around the world but these problems are usually NP-hard to solve. In this paper, we have proposed a new algorithm namely GS-OPT (General Stochastic OPTimization) which is effective for solving the non-convex problems. Our idea is to combine two stochastic bounds of the objective function where they are made by a commonly discrete probability distribution namely Bernoulli. We consider GS-OPT carefully on both the theoretical and experimental aspects. We also apply GS-OPT for solving the posterior inference problem in the latent Dirichlet allocation. Empirical results show that our approach is often more efficient than previous ones.

[1]  Yuanzhi Li,et al.  Neon2: Finding Local Minima via First-Order Oracles , 2017, NeurIPS.

[2]  Julien Mairal,et al.  Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization , 2013, NIPS.

[3]  David M. Blei,et al.  Sparse stochastic inference for latent Dirichlet allocation , 2012, ICML.

[4]  Andrew Blake,et al.  Visual Reconstruction , 1987, Deep Learning for EEG-Based Brain–Computer Interfaces.

[5]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[6]  Roland Badeau,et al.  Stochastic Quasi-Newton Langevin Monte Carlo , 2016, ICML.

[7]  Timothy Baldwin,et al.  Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality , 2014, EACL.

[8]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[9]  GhadimiSaeed,et al.  Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2016 .

[10]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[11]  Zeyuan Allen-Zhu,et al.  Natasha 2: Faster Non-Convex Optimization Than SGD , 2017, NeurIPS.

[12]  Mingyi Hong,et al.  On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization , 2018, ICLR.

[13]  Yee Whye Teh,et al.  Collapsed Variational Inference for HDP , 2007, NIPS.

[14]  Daniel M. Roy,et al.  Complexity of Inference in Latent Dirichlet Allocation , 2011, NIPS.

[15]  Shai Shalev-Shwartz,et al.  On Graduated Optimization for Stochastic Non-Convex Problems , 2015, ICML.

[16]  Khoat Than,et al.  Guaranteed inference in topic models , 2015, 1512.03308.

[17]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Christopher J. Hillar,et al.  Most Tensor Problems Are NP-Hard , 2009, JACM.

[19]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[20]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[21]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[22]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[23]  Elad Hazan,et al.  Projection-free Online Learning , 2012, ICML.

[24]  Yee Whye Teh,et al.  On Smoothing and Inference for Topic Models , 2009, UAI.

[25]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[26]  Le Song,et al.  Provable Bayesian Inference via Particle Mirror Descent , 2015, AISTATS.

[27]  Saeed Ghadimi,et al.  Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.