Majorization-Minimization algorithms for nonsmoothly penalized objective functions

The use of penalization, or regularization, has become common in high-dimensional statistical analysis, where an increasingly frequent goal is to simultaneously select important variables and estimate their effects. It has been shown by several authors that these goals can be achieved by minimizing some parameter-dependent "goodness-of-fit" function (e.g., a negative loglikelihood) subject to a penalization that promotes sparsity. Penalty functions that are singular at the origin have received substantial attention, arguably beginning with the Lasso penalty (62). The current literature tends to focus on specific combinations of dif- ferentiable goodness-of-fit functions and penalty functions singular at the origin. One result of this combined specificity has been a proliferation in the number of computational algorithms designed to solve fairly narrow classes of optimization problems involving objective functions that are not every- where continuously differentiable. In this paper, we propose a general class of algorithms for optimizing an extensive variety of nonsmoothly penalized objective functions that satisfy certain regularity conditions. The proposed framework utilizes the majorization-minimization (MM) algorithm as its core optimization engine. In the case of penalized regression models, the resulting algorithms employ iterated soft-thresholding, implemented com- ponentwise, allowing for fast and stable updating that avoids the need for inverting high-dimensional matrices. We establish convergence theory un- der weaker assumptions than previously considered in the statistical litera- ture. We also demonstrate the exceptional effectiveness of new acceleration methods, originally proposed for the EM algorithm, in this class of prob- lems. Simulation results and a microarray data example are provided to demonstrate the algorithm's capabilities and versatility. AMS 2000 subject classifications: Primary 65C60, 62J07; secondary 62J05, 62J12. Keywords and phrases: Convex optimization, iterative soft threshold- ing, Lasso penalty, minimax concave penalty, non-convex optimization, smoothly clipped absolute deviation penalty.

[1]  James M. Ortega,et al.  Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[2]  Haifen Li,et al.  Induced smoothing for the semiparametric accelerated hazards model , 2012, Comput. Stat. Data Anal..

[3]  T. Hastie,et al.  SparseNet: Coordinate Descent With Nonconvex Penalties , 2011, Journal of the American Statistical Association.

[4]  B. Nan,et al.  Survival Analysis with High-Dimensional Covariates , 2010 .

[5]  Elizabeth D. Schifano,et al.  Topics In Penalized Estimation , 2010 .

[6]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[7]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[8]  R. Strawderman,et al.  Induced smoothing for the semiparametric accelerated failure time model: asymptotics and extensions to clustered data. , 2009, Biometrika.

[9]  Hao Helen Zhang,et al.  ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.

[10]  I. Sohn,et al.  Gradient lasso for Cox proportional hazards model , 2009, Bioinform..

[11]  Adrian E. Raftery,et al.  Iterative Bayesian Model Averaging: a method for the application of survival analysis to high-dimensional microarray data , 2009, BMC Bioinformatics.

[12]  Harald Binder,et al.  Incorporating pathway information into boosting estimation of high-dimensional risk prediction models , 2009, BMC Bioinformatics.

[13]  Y. She,et al.  Thresholding-based iterative selection procedures for model selection and shrinkage , 2008, 0812.5061.

[14]  Jianqing Fan,et al.  Ultrahigh Dimensional Variable Selection: beyond the linear model , 2008, 0812.3201.

[15]  Yongdai Kim,et al.  Smoothly Clipped Absolute Deviation on High Dimensions , 2008 .

[16]  Yin Zhang,et al.  Fixed-Point Continuation for l1-Minimization: Methodology and Convergence , 2008, SIAM J. Optim..

[17]  Peter Buhlmann,et al.  Discussion: One-step sparse estimates in nonconcave penalized likelihood models , 2008, 0808.1013.

[18]  Cun-Hui Zhang Discussion: One-step sparse estimates in nonconcave penalized likelihood models , 2008, 0808.1025.

[19]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[20]  Lorenzo Rosasco,et al.  Elastic-net regularization in learning theory , 2008, J. Complex..

[21]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[22]  Brent A. Johnson,et al.  Penalized Estimating Functions and Variable Selection in Semiparametric Regression Models , 2008, Journal of the American Statistical Association.

[23]  R. Varadhan,et al.  Simple and Globally Convergent Methods for Accelerating the Convergence of Any EM Algorithm , 2008 .

[24]  Alfred O. Hero,et al.  On EM algorithms and their proximal generalizations , 2008, 1201.5912.

[25]  K. Lange,et al.  Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[26]  Martin Schumacher,et al.  Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models , 2008, BMC Bioinformatics.

[27]  Mee Young Park,et al.  L1‐regularization path algorithm for generalized linear models , 2007 .

[28]  S. Rosset,et al.  Piecewise linear regularized solution paths , 2007, 0708.2197.

[29]  Shuangge Ma,et al.  Additive risk survival model with microarray data , 2007, BMC Bioinformatics.

[30]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[31]  H. Zou,et al.  Addendum: Regularization and variable selection via the elastic net , 2005 .

[32]  Ch. Roland,et al.  New iterative schemes for nonlinear fixed point problems, with applications to problems with bifurcations and incomplete-data problems , 2005 .

[33]  Stephen P. Boyd,et al.  Convex Optimization , 2004, IEEE Transactions on Automatic Control.

[34]  D. Hunter,et al.  Variable Selection using MM Algorithms. , 2005, Annals of statistics.

[35]  Jiang Gui,et al.  Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data , 2005, Bioinform..

[36]  Hongzhe Li,et al.  Boosting proportional hazards models using smoothing splines, with applications to high-dimensional microarray data , 2005, Bioinform..

[37]  Jiang Gui,et al.  Threshold Gradient Descent Method for Censored Data Regression with Applications in Pharmacogenomics , 2004, Pacific Symposium on Biocomputing.

[38]  Paul Tseng,et al.  An Analysis of the EM Algorithm and Entropy-Like Proximal Point Methods , 2004, Math. Oper. Res..

[39]  D. Hunter,et al.  A Tutorial on MM Algorithms , 2004 .

[40]  Hiroyuki Honda,et al.  Multiple fuzzy neural network system for outcome prediction and classification of 220 lymphoma patients on the basis of molecular profiling , 2003, Cancer science.

[41]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[42]  A. Yoshimura,et al.  The B cell‐specific major raft protein, Raftlin, is necessary for the integrity of lipid raft and BCR signal transduction , 2003, The EMBO journal.

[43]  Ash A. Alizadeh,et al.  Transformation of follicular lymphoma to diffuse large cell lymphoma is associated with a heterogeneous set of DNA copy number and gene expression alterations. , 2003, Blood.

[44]  Meland,et al.  The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. , 2002, The New England journal of medicine.

[45]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[46]  D. Hunter,et al.  Optimization Transfer Using Surrogate Objective Functions , 2000 .

[47]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[48]  K. Lange,et al.  EM algorithms without missing data , 1997, Statistical methods in medical research.

[49]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[50]  F. Coolen,et al.  Statistical Models Based on Counting Processes. , 1996 .

[51]  K. Lange A gradient algorithm locally equivalent to the EM algorithm , 1995 .

[52]  Y. Ritov,et al.  Monotone Estimating Equations for Censored Data , 1994 .

[53]  J. Hiriart-Urruty,et al.  Convex analysis and minimization algorithms , 1993 .

[54]  Donald Geman,et al.  Constrained Restoration and the Recovery of Discontinuities , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[55]  B. Lindsay,et al.  Monotonicity of quadratic-approximation algorithms , 1988 .

[56]  E. Polak On the mathematical foundations of nondifferentiable optimization in engineering design , 1987 .

[57]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[58]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[59]  R. Meyer A Comparison of the Forcing Function and Point-to-Set Mapping Approaches to Convergence Analysis , 1977 .

[60]  Robert R. Meyer,et al.  Sufficient Conditions for the Convergence of Monotonic Mathematical Programming Algorithms , 1976, J. Comput. Syst. Sci..

[61]  E. M. L. Beale,et al.  Nonlinear Programming: A Unified Approach. , 1970 .

[62]  Robert J Tibshirani,et al.  Statistical Applications in Genetics and Molecular Biology , 2011 .

[63]  I. Gijbels,et al.  Penalized likelihood regression for generalized linear models with non-quadratic penalties , 2011 .

[64]  Yi Li,et al.  Statistical Applications in Genetics and Molecular Biology Survival Analysis with High-Dimensional Covariates : An Application in Microarray Studies , 2011 .

[65]  Cun-Hui Zhang PENALIZED LINEAR UNBIASED SELECTION , 2007 .

[66]  Romain Neugebauer,et al.  Cross-Validated Bagged Prediction of Survival , 2006, Statistical applications in genetics and molecular biology.

[67]  F. Vaida PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS , 2005 .

[68]  Patrick L. Combettes,et al.  Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[69]  Jiang Gui,et al.  Partial Cox regression analysis for high-dimensional microarray gene expression data , 2004, ISMB/ECCB.

[70]  Mila Nikolova,et al.  Local Strong Homogeneity of a Regularized Estimator , 2000, SIAM J. Appl. Math..

[71]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[72]  F. Clarke Optimization And Nonsmooth Analysis , 1983 .

[73]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .

[74]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[75]  Marina Vannucci,et al.  Bioinformatics Original Paper Bayesian Variable Selection for the Analysis of Microarray Data with Censored Outcomes , 2022 .

[76]  Min Zhang,et al.  Theoretical Biology and Medical Modelling , 2022 .

[77]  D.,et al.  Regression Models and Life-Tables , 2022 .