On Landscape of Lagrangian Functions and Stochastic Search for Constrained Nonconvex Optimization

We study constrained nonconvex optimization problems in machine learning, signal processing, and stochastic control. It is well-known that these problems can be rewritten to a minimax problem in a Lagrangian form. However, due to the lack of convexity, their landscape is not well understood and how to find the stable equilibria of the Lagrangian function is still unknown. To bridge the gap, we study the landscape of the Lagrangian function. Further, we define a special class of Lagrangian functions. They enjoy two properties: 1.Equilibria are either stable or unstable (Formal definition in Section 2); 2.Stable equilibria correspond to the global optima of the original problem. We show that a generalized eigenvalue (GEV) problem, including canonical correlation analysis and other problems, belongs to the class. Specifically, we characterize its stable and unstable equilibria by leveraging an invariant group and symmetric property (more details in Section 3). Motivated by these neat geometric structures, we propose a simple, efficient, and stochastic primal-dual algorithm solving the online GEV problem. Theoretically, we provide sufficient conditions, based on which we establish an asymptotic convergence rate and obtain the first sample complexity result for the online GEV problem by diffusion approximations, which are widely used in applied probability and stochastic control. Numerical results are provided to support our theory.

[1]  Terence D. Sanger,et al.  Optimal unsupervised learning in a single-layer linear feedforward neural network , 1989, Neural Networks.

[2]  Alexander Shapiro,et al.  Lectures on Stochastic Programming: Modeling and Theory , 2009 .

[3]  Yuanzhi Li,et al.  Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition , 2016, ICML.

[4]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[5]  Renato D. C. Monteiro,et al.  Primal-dual first-order methods with $${\mathcal {O}(1/\epsilon)}$$ iteration-complexity for cone programming , 2011, Math. Program..

[6]  J. Doob,et al.  The Brownian Movement and Stochastic Equations , 1942 .

[7]  Z.-Q. Luo,et al.  Error bounds and convergence analysis of feasible descent methods: a general approach , 1993, Ann. Oper. Res..

[8]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .

[9]  R. Cook,et al.  Sufficient Dimension Reduction via Inverse Regression , 2005 .

[10]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[11]  Brian David Nowakowski On Multi-parameter Semimartingales, Their Integrals and Weak Convergence , 2013 .

[12]  S. Ethier,et al.  Markov Processes: Characterization and Convergence , 2005 .

[13]  Yi Zheng,et al.  No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.

[14]  Lin F. Yang,et al.  Dropping Convexity for More Efficient and Scalable Online Multiview Learning , 2017, 1702.08134.

[15]  Bamdev Mishra,et al.  Riemannian Preconditioning , 2014, SIAM J. Optim..

[16]  Tong Zhang,et al.  Near-optimal stochastic approximation for online principal component estimation , 2016, Math. Program..

[17]  Robert E. Mahony,et al.  Optimization Algorithms on Matrix Manifolds , 2007 .

[18]  Junwei Lu,et al.  Symmetry, Saddle Points, and Global Geometry of Nonconvex Matrix Factorization , 2016, ArXiv.

[19]  Zhihui Zhu,et al.  The Global Optimization Geometry of Nonsymmetric Matrix Factorization and Sensing , 2017, ArXiv.

[20]  David G. Luenberger,et al.  Linear and nonlinear programming , 1984 .

[21]  Tengyu Ma,et al.  Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[22]  Sham M. Kakade,et al.  Efficient Algorithms for Large-scale Generalized Eigenvector Computation and Canonical Correlation Analysis , 2016, ICML.

[23]  Tuo Zhao,et al.  Online Multiview Representation Learning: Dropping Convexity for Better Efficiency , 2017, ArXiv.

[24]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[25]  Jianqing Fan,et al.  QUADRO: A SUPERVISED DIMENSION REDUCTION METHOD VIA RAYLEIGH QUOTIENT OPTIMIZATION. , 2013, Annals of statistics.

[26]  Nathan Srebro,et al.  Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.

[27]  Nathan Srebro,et al.  Stochastic Approximation for Canonical Correlation Analysis , 2017, NIPS.

[28]  Y. Nesterov,et al.  Primal-dual subgradient methods for minimizing uniformly convex functions , 2010, 1401.1792.

[29]  Zhihui Zhu,et al.  The Global Optimization Geometry of Low-Rank Matrix Optimization , 2017, IEEE Transactions on Information Theory.

[30]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[31]  Yunmei Chen,et al.  Optimal Primal-Dual Methods for a Class of Saddle Point Problems , 2013, SIAM J. Optim..

[32]  John Wright,et al.  A Geometric Analysis of Phase Retrieval , 2016, International Symposium on Information Theory.

[33]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[34]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[35]  Boris Polyak Gradient methods for the minimisation of functionals , 1963 .

[36]  Genevieve Gorrell,et al.  Generalized Hebbian Algorithm for Incremental Singular Value Decomposition in Natural Language Processing , 2006, EACL.

[37]  Junwei Lu,et al.  Symmetry. Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization , 2016, 2018 Information Theory and Applications Workshop (ITA).