Sequential Quadratic Optimization for Nonlinear Equality Constrained Stochastic Optimization

Sequential quadratic optimization algorithms are proposed for solving smooth nonlinear optimization problems with equality constraints. The main focus is an algorithm proposed for the case when the constraint functions are deterministic, and constraint function and derivative values can be computed explicitly, but the objective function is stochastic. It is assumed in this setting that it is intractable to compute objective function and derivative values explicitly, although one can compute stochastic function and gradient estimates. As a starting point for this stochastic setting, an algorithm is proposed for the deterministic setting that is modeled after a state-of-the-art line-search SQP algorithm, but uses a stepsize selection scheme based on Lipschitz constants (or adaptively estimated Lipschitz constants) in place of the line search. This sets the stage for the proposed algorithm for the stochastic setting, for which it is assumed that line searches would be intractable. Under reasonable assumptions, convergence (resp.,~convergence in expectation) from remote starting points is proved for the proposed deterministic (resp.,~stochastic) algorithm. The results of numerical experiments demonstrate the practical performance of our proposed techniques.

[1]  Soumava Kumar Roy,et al.  Geometry Aware Constrained Optimization Techniques for Deep Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Haipeng Luo,et al.  Variance-Reduced and Projection-Free Stochastic Optimization , 2016, ICML.

[3]  Olvi L. Mangasarian,et al.  Exact penalty functions in nonlinear programming , 1979, Math. Program..

[4]  Jorge Nocedal,et al.  A trust region method based on interior point techniques for nonlinear programming , 2000, Math. Program..

[5]  Alexander J. Smola,et al.  Stochastic Frank-Wolfe methods for nonconvex optimization , 2016, 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[6]  M. Powell A method for nonlinear constraints in minimization problems , 1969 .

[7]  P. Toint,et al.  Lancelot: A FORTRAN Package for Large-Scale Nonlinear Optimization (Release A) , 1992 .

[8]  Alexander Shapiro,et al.  Lectures on Stochastic Programming: Modeling and Theory , 2009 .

[9]  Haihao Lu,et al.  Generalized stochastic Frank–Wolfe algorithm with stochastic “substitute” gradient for structured convex optimization , 2018, Mathematical Programming.

[10]  M. Hestenes Multiplier and gradient methods , 1969 .

[11]  Jorge Nocedal,et al.  An Inexact SQP Method for Equality Constrained Optimization , 2008, SIAM J. Optim..

[12]  Alp Yurtsever,et al.  Stochastic Frank-Wolfe for Composite Convex Minimization , 2019, NeurIPS.

[13]  Parag Singla,et al.  A Primal Dual Formulation For Deep Learning With Constraints , 2019, NeurIPS.

[14]  M. J. D. Powell,et al.  A fast algorithm for nonlinearly constrained optimization calculations , 1978 .

[15]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[16]  Jorge J. Moré,et al.  Digital Object Identifier (DOI) 10.1007/s101070100263 , 2001 .

[17]  Greg Mori,et al.  Constraint-Aware Deep Neural Network Compression , 2018, ECCV.

[18]  Donald Goldfarb,et al.  Linear Convergence of Stochastic Frank Wolfe Variants , 2017, AISTATS.

[19]  Stephen J. Wright,et al.  Numerical Optimization (Springer Series in Operations Research and Financial Engineering) , 2000 .

[20]  Shih-Ping Han A globally convergent method for nonlinear programming , 1975 .

[21]  Amin Karbasi,et al.  One Sample Stochastic Frank-Wolfe , 2019, AISTATS.

[22]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[23]  Jorge Nocedal,et al.  An inexact Newton method for nonconvex equality constrained optimization , 2009, Math. Program..

[24]  Daniel P. Robinson,et al.  Exploiting negative curvature in deterministic and stochastic optimization , 2017, Mathematical Programming.

[25]  Nicholas I. M. Gould,et al.  CUTE: constrained and unconstrained testing environment , 1995, TOMS.

[26]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[27]  Y. L. Tong The multivariate normal distribution , 1989 .

[28]  J Reddi Sashank,et al.  Stochastic Frank-Wolfe methods for nonconvex optimization , 2016 .

[29]  Sathya N. Ravi,et al.  Explicitly Imposing Constraints in Deep Networks via Conditional Gradients Gives Improved Generalization and Faster Convergence , 2019, AAAI.

[30]  Lorenz T. Biegler,et al.  On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming , 2006, Math. Program..

[31]  R. Courant Variational methods for the solution of problems of equilibrium and vibrations , 1943 .

[32]  R. Fletcher Practical Methods of Optimization , 1988 .

[33]  Jorge Nocedal,et al.  An Interior Point Algorithm for Large-Scale Nonlinear Programming , 1999, SIAM J. Optim..