Perspective maximum likelihood-type estimation via proximal decomposition

We introduce an optimization model for maximum likelihood-type estimation (M-estimation) that generalizes a large class of existing statistical models, including Huber's concomitant M-estimator, Owen's Huber/Berhu concomitant estimator, the scaled lasso, support vector machine regression, and penalized estimation with structured sparsity. The model, termed perspective M-estimation, leverages the observation that convex M-estimators with concomitant scale as well as various regularizers are instances of perspective functions. Such functions are amenable to proximal analysis, which leads to principled and provably convergent optimization algorithms via proximal splitting. Using a geometrical approach based on duality, we derive novel proximity operators for several perspective functions of interest. Numerical experiments on synthetic and real-world data illustrate the broad applicability of the proposed framework.

[1]  Patrick L. Combettes,et al.  Proximal Thresholding Algorithm for Minimization over Orthonormal Bases , 2007, SIAM J. Optim..

[2]  A. Antoniadis Comments on: ℓ1-penalization for mixture regression models , 2010 .

[3]  Arthur E. Hoerl,et al.  Application of ridge analysis to regression problems , 1962 .

[4]  Cun-Hui Zhang,et al.  Scaled sparse linear regression , 2011, 1104.4595.

[5]  Massimiliano Pontil,et al.  New Perspectives on k-Support and Cluster Norms , 2014, J. Mach. Learn. Res..

[6]  A. Owen A robust hybrid of lasso and ridge regression , 2006 .

[7]  Patrick L. Combettes,et al.  Perspective Functions: Proximal Calculus and Applications in High-Dimensional Statistics , 2016, 1610.01478.

[8]  Laurent Zwald,et al.  Robust regression through the Huber’s criterion and adaptive lasso penalty , 2011 .

[9]  Patrick L. Combettes,et al.  Stochastic Quasi-Fejér Block-Coordinate Fixed Point Iterations with Random Sweeping , 2014 .

[10]  Anestis Antoniadis,et al.  Wavelet methods in statistics: Some recent developments and their applications , 2007, 0712.0283.

[11]  Weixin Yao,et al.  Robust linear regression: A review and comparison , 2014, Commun. Stat. Simul. Comput..

[12]  Patrick L. Combettes,et al.  Perspective Functions: Properties, Constructions, and Examples , 2016, Set-Valued and Variational Analysis.

[13]  J. Moreau Fonctions convexes duales et points proximaux dans un espace hilbertien , 1962 .

[14]  Julien Mairal,et al.  Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..

[15]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[16]  Error bounds for the convex loss Lasso in linear models , 2017 .

[17]  E. Barrio Comments on: l1-penalization for mixture regression models , 2010 .

[18]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[19]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[20]  Patrick L. Combettes,et al.  A Monotone+Skew Splitting Model for Composite Monotone Inclusions in Duality , 2010, SIAM J. Optim..

[21]  R. Tibshirani Adaptive piecewise polynomial estimation via trend filtering , 2013, 1304.2986.

[22]  Charles A. Micchelli,et al.  Regularizers for structured sparsity , 2010, Advances in Computational Mathematics.

[23]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[24]  P. Lions,et al.  Splitting Algorithms for the Sum of Two Nonlinear Operators , 1979 .

[25]  Patrick L. Combettes,et al.  Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[26]  F. Bach,et al.  Optimization with Sparsity-Inducing Penalties (Foundations and Trends(R) in Machine Learning) , 2011 .

[27]  Jinfeng Xu,et al.  Simultaneous estimation and variable selection in median regression using Lasso-type penalty , 2010, Annals of the Institute of Statistical Mathematics.

[28]  Jacob Bien,et al.  Estimating the error variance in a high-dimensional linear model , 2017, Biometrika.

[29]  Alexandre Gramfort,et al.  Efficient Smoothed Concomitant Lasso Estimation for High Dimensional Regression , 2016, ArXiv.

[30]  Peter Bühlmann,et al.  High-Dimensional Statistics with a View Toward Applications in Biology , 2014 .

[31]  S. Geer,et al.  The Smooth-Lasso and other ℓ1+ℓ2-penalized methods , 2011 .

[32]  Esa Ollila,et al.  Scaled and square-root elastic net , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[34]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[35]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[36]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[37]  Dimitri P. Bertsekas,et al.  On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators , 1992, Math. Program..

[38]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[39]  Sophie Lambert-Lacroix,et al.  The adaptive BerHu penalty in robust regression , 2016 .

[40]  Valérie R. Wajs,et al.  A variational formulation for frame-based inverse problems , 2007 .

[41]  Patrick L. Combettes,et al.  Asynchronous block-iterative primal-dual decomposition methods for monotone inclusions , 2015, Mathematical Programming.

[42]  Yiyuan She,et al.  Outlier Detection Using Nonconvex Penalized Regression , 2010, ArXiv.

[43]  Hongzhe Li,et al.  High‐Dimensional Heteroscedastic Regression with an Application to eQTL Data Analysis , 2012, Biometrics.