Variational Properties of Value Functions

Regularization plays a key role in a variety of optimization formulations of inverse problems. A recurring theme in regularization approaches is the selection of regularization parameters and their effect on the solution and on the optimal value of the optimization problem. The sensitivity of the value function to the regularization parameter can be linked directly to the Lagrange multipliers. This paper characterizes the variational properties of the value functions for a broad class of convex formulations, which are not all covered by standard Lagrange multiplier theory. An inverse function theorem is given that links the value functions of different regularization formulations (not necessarily convex). These results have implications for the selection of regularization parameters, and the development of specialized algorithms. Numerical examples illustrate the theoretical results.

[1]  R. Tyrrell Rockafellar,et al.  Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[2]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..

[3]  James V. Burke,et al.  Robust and Trend-following Kalman Smoothers using Student's t , 2010, 1001.3907.

[4]  Bastian Goldlücke,et al.  Variational Analysis , 2014, Computer Vision, A Reference Guide.

[5]  Bhaskar D. Rao,et al.  Latent Variable Bayesian Models for Promoting Sparsity , 2011, IEEE Transactions on Information Theory.

[6]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[7]  D. Donoho,et al.  Sparse nonnegative solution of underdetermined linear equations by linear programming. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[8]  R. Tyrrell Rockafellar,et al.  Lagrange Multipliers and Optimality , 1993, SIAM Rev..

[9]  Richard Tapia The Isoperimetric Problem Revisited : Extracting a Short Proof of Sufficiency from Euler ’ s 1744 Approach to Necessity ∗ , 2013 .

[10]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[11]  Junbin Gao,et al.  Robust L1 Principal Component Analysis and Its Bayesian Variational Inference , 2008, Neural Computation.

[12]  R. Tyrrell Rockafellar,et al.  Variational Analysis , 1998, Grundlehren der mathematischen Wissenschaften.

[13]  W. Sharpe,et al.  Mean-Variance Analysis in Portfolio Choice and Capital Markets , 1987 .

[14]  Zoubin Ghahramani,et al.  A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[15]  Michael P. Friedlander,et al.  Probing the Pareto Frontier for Basis Pursuit Solutions , 2008, SIAM J. Sci. Comput..

[16]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[17]  I. Ekeland,et al.  Convex analysis and variational problems , 1976 .

[18]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[19]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[20]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[21]  C. Combari,et al.  Sous-différentiels de fonctions convexes composées , 1994 .

[22]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[23]  David J. C. MacKay,et al.  BAYESIAN NON-LINEAR MODELING FOR THE PREDICTION COMPETITION , 1996 .

[24]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[25]  G. Pillonetto,et al.  An $\ell _{1}$-Laplace Robust Kalman Smoother , 2011, IEEE Transactions on Automatic Control.

[26]  Saburou Saitoh,et al.  Theory of Reproducing Kernels and Its Applications , 1988 .

[27]  Aleksandr Y. Aravkin,et al.  Sparse/robust estimation and Kalman smoothing with nonsmooth log-concave densities: modeling, computation, and theory , 2013, J. Mach. Learn. Res..

[28]  Adrian S. Lewis,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[29]  Massimiliano Pontil,et al.  Properties of Support Vector Machines , 1998, Neural Computation.

[30]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[31]  J. Hiriart-Urruty,et al.  Fundamentals of Convex Analysis , 2004 .

[32]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[33]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[34]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[35]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[36]  Petros G. Voulgaris,et al.  On optimal ℓ∞ to ℓ∞ filtering , 1995, Autom..

[37]  C. Zălinescu Convex analysis in general vector spaces , 2002 .

[38]  Georgios B. Giannakis,et al.  Doubly Robust Smoothing of Dynamical Processes via Outlier Sparsity Constraints , 2011, IEEE Transactions on Signal Processing.

[39]  Michael P. Friedlander,et al.  Sparse Optimization with Least-Squares Constraints , 2011, SIAM J. Optim..

[40]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[41]  M. Teboulle,et al.  Asymptotic cones and functions in optimization and variational inequalities , 2002 .

[42]  Felix J. Herrmann,et al.  Fighting the Curse of Dimensionality: Compressive Sensing in Exploration Seismology , 2012, IEEE Signal Processing Magazine.

[43]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[44]  Felix J. Herrmann,et al.  Robust inversion, dimensionality reduction, and randomized sampling , 2012, Math. Program..

[45]  R. Brockett Finite Dimensional Linear Systems , 2015 .

[46]  Michael P. Friedlander,et al.  Theoretical and Empirical Results for Recovery From Multiple Measurements , 2009, IEEE Transactions on Information Theory.

[47]  Felipe Cucker,et al.  On the mathematical foundations of learning , 2001 .