论文信息 - Minimizing oracle-structured composite functions - 字舞流文

Minimizing oracle-structured composite functions

We consider the problem of minimizing a composite convex function with two different access methods: an oracle , for which we can evaluate the value and gradient, and a structured function , which we access only by solving a convex optimization problem. We are motivated by two associated technological developments. For the oracle, systems like PyTorch or TensorFlow can automatically and efficiently compute gradients, given a computation graph description. For the structured function, systems like CVXPY accept a high level domain specific language description of the problem, and automatically translate it to a standard form for efficient solution. We develop a method that makes minimal assumptions about the two functions, does not require the tuning of algorithm parameters, and works well in practice across a variety of problems. Our algorithm combines a number of well-known ideas, including a low-rank quasi-Newton approximation of curvature, piecewise affine lower bounds from bundle-type methods, and two types of damping to ensure stability. We illustrate the method on stochastic optimization, utility maximization, and risk-averse programming problems, showing that our method is more efficient than standard solvers when the oracle function contains much data.

Stephen P. Boyd | Stephen Boyd | Xinyue Shen | Alnur Ali | Alnur Ali | Xinyue Shen

[1] Mikhail V. Solodov,et al. A doubly stabilized bundle method for nonsmooth convex optimization , 2016, Math. Program..

[2] Stephen Becker,et al. On Quasi-Newton Forward-Backward Splitting: Proximal Calculus and Convergence , 2018, SIAM J. Optim..

[3] Stephen P. Boyd,et al. Disciplined Convex Programming , 2006 .

[4] R. Rockafellar,et al. Conditional Value-at-Risk for General Loss Distributions , 2001 .

[5] Yurii Nesterov,et al. New variants of bundle methods , 1995, Math. Program..

[6] K. Kiwiel. Efficiency of Proximal Bundle Methods , 2000 .

[7] Anthony Man-Cho So,et al. A family of inexact SQA methods for non-smooth convex minimization with provable convergence guarantees based on the Luo–Tseng error bound property , 2016, Math. Program..

[8] Antonio Frangioni,et al. Generalized Bundle Methods , 2002, SIAM J. Optim..

[9] Roger Fletcher. A New Low Rank Quasi-Newton Update Scheme for Nonlinear Programming , 2005, System Modelling and Optimization.

[10] Michael Innes,et al. Fashionable Modelling with Flux , 2018, ArXiv.

[11] Boris Polyak,et al. Constrained minimization methods , 1966 .

[12] Katya Scheinberg,et al. Practical inexact proximal quasi-Newton method with global complexity analysis , 2013, Mathematical Programming.

[13] Haishan Ye,et al. Approximate Newton Methods and Their Local Convergence , 2017, ICML.

[14] J. J. Moré,et al. Quasi-Newton Methods, Motivation and Theory , 1974 .

[15] Michael A. Saunders,et al. Proximal Newton-Type Methods for Minimizing Composite Functions , 2012, SIAM J. Optim..

[16] Nicolas Le Roux,et al. Tracking the gradients using the Hessian: A new look at variance reducing stochastic methods , 2017, AISTATS.

[17] Antonio Frangioni,et al. Incremental Bundle Methods using Upper Models , 2018, SIAM J. Optim..

[18] Barak A. Pearlmutter,et al. Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..

[19] Alexander Shapiro,et al. Coherent risk measures in inventory problems , 2007, Eur. J. Oper. Res..

[20] Martin S. Andersen,et al. Inexact proximal Newton methods for self-concordant functions , 2017, Math. Methods Oper. Res..

[21] Stephen P. Boyd,et al. Risk-Constrained Kelly Gambling , 2016, The Journal of Investing.

[22] S. V. N. Vishwanathan,et al. A Quasi-Newton Approach to Nonsmooth Convex Optimization Problems in Machine Learning , 2008, J. Mach. Learn. Res..

[23] Stan Uryasev,et al. Conditional Value-at-Risk: Optimization Approach , 2001 .

[24] John L. Kelly,et al. A new interpretation of information rate , 1956, IRE Trans. Inf. Theory.

[25] Mark W. Schmidt,et al. Optimizing Costly Functions with Simple Constraints: A Limited-Memory Projected Quasi-Newton Algorithm , 2009, AISTATS.

[26] Stephen Boyd,et al. A Rewriting System for Convex Optimization Problems , 2017, ArXiv.

[27] Stephen P. Boyd,et al. CVXPY: A Python-Embedded Modeling Language for Convex Optimization , 2016, J. Mach. Learn. Res..

[28] Claude Lemaréchal,et al. Convex proximal bundle methods in depth: a unified analysis for inexact oracles , 2014, Math. Program..

[29] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[30] Michael I. Jordan,et al. Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[31] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[32] Alan Edelman,et al. A Differentiable Programming System to Bridge Machine Learning and Scientific Computing , 2019, ArXiv.

[33] Variable Metric Method for Minimization , 1959 .

[34] Krzysztof C. Kiwiel,et al. Proximity control in bundle methods for convex nondifferentiable minimization , 1990, Math. Program..

[35] Xiaoming Yuan,et al. A globally convergent proximal Newton-type method in nonsmooth convex optimization , 2020, Mathematical Programming.

[36] W. van Ackooij,et al. A strongly convergent proximal bundle method for convex minimization in Hilbert spaces , 2016 .

[37] Defeng Sun,et al. Quasi-Newton Bundle-Type Methods for Nondifferentiable Convex Optimization , 1998, SIAM J. Optim..

[38] Amir Ahmadi-Javid,et al. An information-theoretic approach to constructing coherent risk measures , 2011, 2011 IEEE International Symposium on Information Theory Proceedings.

[39] Robert Mifflin,et al. A quasi-second-order proximal bundle algorithm , 1996, Math. Program..

[40] Symposium on Simplicity in Algorithms (SOSA) , 2021 .

[41] Claude Lemaréchal,et al. Variable metric bundle methods: From conceptual to implementable forms , 1997, Math. Program..

[42] Jorge Nocedal,et al. Representations of quasi-Newton matrices and their use in limited memory methods , 1994, Math. Program..

[43] Stephen P. Boyd,et al. Convex Optimization in Julia , 2014, 2014 First Workshop for High Performance Technical Computing in Dynamic Languages.

[44] Ximing Wu,et al. Penalized exponential series estimation of copula densities with an application to intergenerational dependence of body mass index , 2015 .

[45] Ximing Wu. Exponential Series Estimator of Multivariate Densities , 2007 .

[46] Yue Yu,et al. SPAN: A Stochastic Projected Approximate Newton Method , 2020, AAAI.

[47] Jean-Yves Audibert. Optimization for Machine Learning , 1995 .

[48] Stephen P. Boyd,et al. ECOS: An SOCP solver for embedded systems , 2013, 2013 European Control Conference (ECC).

[49] D. Noll. Bundle Method for Non-Convex Minimization with Inexact Subgradients and Function Values , 2013 .

[50] Andrea Montanari,et al. Convergence rates of sub-sampled Newton methods , 2015, NIPS.

[51] Ludovít Molnár,et al. Analytical differentiation on a digital computer , 1970, Kybernetika.

[52] David P. Woodruff,et al. Hutch++: Optimal Stochastic Trace Estimation , 2020, SOSA.

[53] Stephen P. Boyd,et al. Conic Optimization via Operator Splitting and Homogeneous Self-Dual Embedding , 2013, Journal of Optimization Theory and Applications.

[54] Roger Fletcher,et al. A Rapidly Convergent Descent Method for Minimization , 1963, Comput. J..

[55] Jón Dańıelsson,et al. Fat tails, VaR and subadditivity☆ , 2013 .

[56] Stephen J. Wright,et al. Inexact Successive quadratic approximation for regularized optimization , 2018, Comput. Optim. Appl..

[57] Penalized Exponential Series Estimation of Copula Densities , 2011 .

[58] M. Hutchinson. A stochastic estimator of the trace of the influence matrix for laplacian smoothing splines , 1989 .

[59] C. Lemaréchal. Nonsmooth Optimization and Descent Methods , 1978 .

[60] Jan Vlcek,et al. A bundle-Newton method for nonsmooth unconstrained minimization , 1998, Math. Program..

[61] Katya Scheinberg,et al. Proximal quasi-Newton methods for regularized convex optimization with linear and accelerated sublinear convergence rates , 2016, Comput. Optim. Appl..

[62] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[63] Jochem Zowe,et al. A Version of the Bundle Idea for Minimizing a Nonsmooth Function: Conceptual Idea, Convergence Analysis, Numerical Results , 1992, SIAM J. Optim..

[64] Mohamed-Jalal Fadili,et al. A quasi-Newton proximal splitting method , 2012, NIPS.

[65] J. Y. Bello Cruz,et al. A strongly convergent proximal bundle method for convex minimization in Hilbert spaces , 2016 .

[66] John C. Duchi,et al. Modeling simple structures and geometry for better stochastic optimization algorithms , 2019, AISTATS.

[67] L. Armijo. Minimization of functions having Lipschitz continuous first partial derivatives. , 1966 .

[68] Masao Fukushima,et al. A descent algorithm for nonsmooth convex optimization , 1984, Math. Program..

[69] C. G. Broyden. A Class of Methods for Solving Nonlinear Simultaneous Equations , 1965 .

[70] Xiaoguang Yang,et al. Optimal portfolio allocation under the probabilistic VaR constraint and incentives for financial innovation , 2008 .

[71] Patrick Marsh. Goodness of fit tests via exponential series density estimation , 2007, Comput. Stat. Data Anal..

[72] Alexander J. Smola,et al. Bundle Methods for Regularized Risk Minimization , 2010, J. Mach. Learn. Res..

[73] Yurii Nesterov,et al. Gradient methods for minimizing composite functions , 2012, Mathematical Programming.