Minimizing oracle-structured composite functions

We consider the problem of minimizing a composite convex function with two different access methods: an oracle , for which we can evaluate the value and gradient, and a structured function , which we access only by solving a convex optimization problem. We are motivated by two associated technological developments. For the oracle, systems like PyTorch or TensorFlow can automatically and efficiently compute gradients, given a computation graph description. For the structured function, systems like CVXPY accept a high level domain specific language description of the problem, and automatically translate it to a standard form for efficient solution. We develop a method that makes minimal assumptions about the two functions, does not require the tuning of algorithm parameters, and works well in practice across a variety of problems. Our algorithm combines a number of well-known ideas, including a low-rank quasi-Newton approximation of curvature, piecewise affine lower bounds from bundle-type methods, and two types of damping to ensure stability. We illustrate the method on stochastic optimization, utility maximization, and risk-averse programming problems, showing that our method is more efficient than standard solvers when the oracle function contains much data.

[1]  Mikhail V. Solodov,et al.  A doubly stabilized bundle method for nonsmooth convex optimization , 2016, Math. Program..

[2]  Stephen Becker,et al.  On Quasi-Newton Forward-Backward Splitting: Proximal Calculus and Convergence , 2018, SIAM J. Optim..

[3]  Stephen P. Boyd,et al.  Disciplined Convex Programming , 2006 .

[4]  R. Rockafellar,et al.  Conditional Value-at-Risk for General Loss Distributions , 2001 .

[5]  Yurii Nesterov,et al.  New variants of bundle methods , 1995, Math. Program..

[6]  K. Kiwiel Efficiency of Proximal Bundle Methods , 2000 .

[7]  Anthony Man-Cho So,et al.  A family of inexact SQA methods for non-smooth convex minimization with provable convergence guarantees based on the Luo–Tseng error bound property , 2016, Math. Program..

[8]  Antonio Frangioni,et al.  Generalized Bundle Methods , 2002, SIAM J. Optim..

[9]  Roger Fletcher A New Low Rank Quasi-Newton Update Scheme for Nonlinear Programming , 2005, System Modelling and Optimization.

[10]  Michael Innes,et al.  Fashionable Modelling with Flux , 2018, ArXiv.

[11]  Boris Polyak,et al.  Constrained minimization methods , 1966 .

[12]  Katya Scheinberg,et al.  Practical inexact proximal quasi-Newton method with global complexity analysis , 2013, Mathematical Programming.

[13]  Haishan Ye,et al.  Approximate Newton Methods and Their Local Convergence , 2017, ICML.

[14]  J. J. Moré,et al.  Quasi-Newton Methods, Motivation and Theory , 1974 .

[15]  Michael A. Saunders,et al.  Proximal Newton-Type Methods for Minimizing Composite Functions , 2012, SIAM J. Optim..

[16]  Nicolas Le Roux,et al.  Tracking the gradients using the Hessian: A new look at variance reducing stochastic methods , 2017, AISTATS.

[17]  Antonio Frangioni,et al.  Incremental Bundle Methods using Upper Models , 2018, SIAM J. Optim..

[18]  Barak A. Pearlmutter,et al.  Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..

[19]  Alexander Shapiro,et al.  Coherent risk measures in inventory problems , 2007, Eur. J. Oper. Res..

[20]  Martin S. Andersen,et al.  Inexact proximal Newton methods for self-concordant functions , 2017, Math. Methods Oper. Res..

[21]  Stephen P. Boyd,et al.  Risk-Constrained Kelly Gambling , 2016, The Journal of Investing.

[22]  S. V. N. Vishwanathan,et al.  A Quasi-Newton Approach to Nonsmooth Convex Optimization Problems in Machine Learning , 2008, J. Mach. Learn. Res..

[23]  Stan Uryasev,et al.  Conditional Value-at-Risk: Optimization Approach , 2001 .

[24]  John L. Kelly,et al.  A new interpretation of information rate , 1956, IRE Trans. Inf. Theory.

[25]  Mark W. Schmidt,et al.  Optimizing Costly Functions with Simple Constraints: A Limited-Memory Projected Quasi-Newton Algorithm , 2009, AISTATS.

[26]  Stephen Boyd,et al.  A Rewriting System for Convex Optimization Problems , 2017, ArXiv.

[27]  Stephen P. Boyd,et al.  CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..

[28]  Claude Lemaréchal,et al.  Convex proximal bundle methods in depth: a unified analysis for inexact oracles , 2014, Math. Program..

[29]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[30]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[31]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[32]  Alan Edelman,et al.  A Differentiable Programming System to Bridge Machine Learning and Scientific Computing , 2019, ArXiv.

[33]  Variable Metric Method for Minimization , 1959 .

[34]  Krzysztof C. Kiwiel,et al.  Proximity control in bundle methods for convex nondifferentiable minimization , 1990, Math. Program..

[35]  Xiaoming Yuan,et al.  A globally convergent proximal Newton-type method in nonsmooth convex optimization , 2020, Mathematical Programming.

[36]  W. van Ackooij,et al.  A strongly convergent proximal bundle method for convex minimization in Hilbert spaces , 2016 .

[37]  Defeng Sun,et al.  Quasi-Newton Bundle-Type Methods for Nondifferentiable Convex Optimization , 1998, SIAM J. Optim..

[38]  Amir Ahmadi-Javid,et al.  An information-theoretic approach to constructing coherent risk measures , 2011, 2011 IEEE International Symposium on Information Theory Proceedings.

[39]  Robert Mifflin,et al.  A quasi-second-order proximal bundle algorithm , 1996, Math. Program..

[40]  Symposium on Simplicity in Algorithms (SOSA) , 2021 .

[41]  Claude Lemaréchal,et al.  Variable metric bundle methods: From conceptual to implementable forms , 1997, Math. Program..

[42]  Jorge Nocedal,et al.  Representations of quasi-Newton matrices and their use in limited memory methods , 1994, Math. Program..

[43]  Stephen P. Boyd,et al.  Convex Optimization in Julia , 2014, 2014 First Workshop for High Performance Technical Computing in Dynamic Languages.

[44]  Ximing Wu,et al.  Penalized exponential series estimation of copula densities with an application to intergenerational dependence of body mass index , 2015 .

[45]  Ximing Wu Exponential Series Estimator of Multivariate Densities , 2007 .

[46]  Yue Yu,et al.  SPAN: A Stochastic Projected Approximate Newton Method , 2020, AAAI.

[47]  Jean-Yves Audibert Optimization for Machine Learning , 1995 .

[48]  Stephen P. Boyd,et al.  ECOS: An SOCP solver for embedded systems , 2013, 2013 European Control Conference (ECC).

[49]  D. Noll Bundle Method for Non-Convex Minimization with Inexact Subgradients and Function Values , 2013 .

[50]  Andrea Montanari,et al.  Convergence rates of sub-sampled Newton methods , 2015, NIPS.

[51]  Ludovít Molnár,et al.  Analytical differentiation on a digital computer , 1970, Kybernetika.

[52]  David P. Woodruff,et al.  Hutch++: Optimal Stochastic Trace Estimation , 2020, SOSA.

[53]  Stephen P. Boyd,et al.  Conic Optimization via Operator Splitting and Homogeneous Self-Dual Embedding , 2013, Journal of Optimization Theory and Applications.

[54]  Roger Fletcher,et al.  A Rapidly Convergent Descent Method for Minimization , 1963, Comput. J..

[55]  Jón Dańıelsson,et al.  Fat tails, VaR and subadditivity☆ , 2013 .

[56]  Stephen J. Wright,et al.  Inexact Successive quadratic approximation for regularized optimization , 2018, Comput. Optim. Appl..

[57]  Penalized Exponential Series Estimation of Copula Densities , 2011 .

[58]  M. Hutchinson A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines , 1989 .

[59]  C. Lemaréchal Nonsmooth Optimization and Descent Methods , 1978 .

[60]  Jan Vlcek,et al.  A bundle-Newton method for nonsmooth unconstrained minimization , 1998, Math. Program..

[61]  Katya Scheinberg,et al.  Proximal quasi-Newton methods for regularized convex optimization with linear and accelerated sublinear convergence rates , 2016, Comput. Optim. Appl..

[62]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[63]  Jochem Zowe,et al.  A Version of the Bundle Idea for Minimizing a Nonsmooth Function: Conceptual Idea, Convergence Analysis, Numerical Results , 1992, SIAM J. Optim..

[64]  Mohamed-Jalal Fadili,et al.  A quasi-Newton proximal splitting method , 2012, NIPS.

[65]  J. Y. Bello Cruz,et al.  A strongly convergent proximal bundle method for convex minimization in Hilbert spaces , 2016 .

[66]  John C. Duchi,et al.  Modeling simple structures and geometry for better stochastic optimization algorithms , 2019, AISTATS.

[67]  L. Armijo Minimization of functions having Lipschitz continuous first partial derivatives. , 1966 .

[68]  Masao Fukushima,et al.  A descent algorithm for nonsmooth convex optimization , 1984, Math. Program..

[69]  C. G. Broyden A Class of Methods for Solving Nonlinear Simultaneous Equations , 1965 .

[70]  Xiaoguang Yang,et al.  Optimal portfolio allocation under the probabilistic VaR constraint and incentives for financial innovation , 2008 .

[71]  Patrick Marsh Goodness of fit tests via exponential series density estimation , 2007, Comput. Stat. Data Anal..

[72]  Alexander J. Smola,et al.  Bundle Methods for Regularized Risk Minimization , 2010, J. Mach. Learn. Res..

[73]  Yurii Nesterov,et al.  Gradient methods for minimizing composite functions , 2012, Mathematical Programming.