J an 2 01 1 MM Algorithms for Minimizing Nonsmoothly Penalized Objective Functions

The use of regularization, or penalization, has become incr easingly common in highdimensional statistical analysis over the past decade, whe re a common goal is to simultaneously select important variables and estimate their e ffects. It has been shown by several authors that these goals can be achieved by minimizing some parameter-depende nt “goodness of fit” function (e.g., a negative loglikelihood) subject to a penalization that pr omotes sparsity. Penalty functions that are nonsmooth (i.e. not di fferentiable) at the origin have received substantial attent ion, arguably beginning with LASSO (Tibshirani, 1996). The current literature tends to focus on specific combinatio s f smooth data fidelity (i.e., goodness-of-fit) and nonsmooth penalty functions. One resu lt of this combined specificity has been a proliferation in the number of computational algorithms d e igned to solve fairly narrow classes of optimization problems involving objective functions that are not everywhere continuously di fferentiable. In this paper, we propose a general class of algorith ms for optimizing an extensive variety of nonsmoothly penalized objective functions that satisfy ce rtain regularity conditions. The proposed framework utilizes the majorization-minimization (MM) al gorithm as its core optimization engine. The resulting algorithms rely on iterated soft-thresholdi ng, implemented componentwise, allowing for fast, stable updating that avoids the need for any high-d imensional matrix inversion. We establish a local convergence theory for this class of algorithms unde r weaker assumptions than previously considered in the statistical literature. We also demonstr ate he exceptional e ffectiveness of new acceleration methods, originally proposed for the EM algorit hm, in this class of problems. Simulation results and a microarray data example are provided to demons trate the algorithm’s capabilities and versatility.

[1]  Haifen Li,et al.  Induced smoothing for the semiparametric accelerated hazards model , 2012, Comput. Stat. Data Anal..

[2]  I. Lossos,et al.  Transformation of follicular lymphoma. , 2011, Best practice & research. Clinical haematology.

[3]  I. Gijbels,et al.  Penalized likelihood regression for generalized linear models with non-quadratic penalties , 2011 .

[4]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[5]  Robert Tibshirani,et al.  Survival analysis with high-dimensional covariates , 2010, Statistical methods in medical research.

[6]  R. Strawderman,et al.  Induced smoothing for the semiparametric accelerated failure time model: asymptotics and extensions to clustered data. , 2009, Biometrika.

[7]  Hao Helen Zhang,et al.  ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.

[8]  Insuk Sohn,et al.  Gradient lasso for Cox proportional hazards model , 2009, Bioinform..

[9]  Yi Li,et al.  Statistical Applications in Genetics and Molecular Biology Survival Analysis with High-Dimensional Covariates : An Application in Microarray Studies , 2011 .

[10]  Lorenzo Rosasco,et al.  Elastic-net regularization in learning theory , 2008, J. Complex..

[11]  Robert J Tibshirani,et al.  Statistical Applications in Genetics and Molecular Biology , 2011 .

[12]  Adrian E. Raftery,et al.  Iterative Bayesian Model Averaging: a method for the application of survival analysis to high-dimensional microarray data , 2009, BMC Bioinformatics.

[13]  Harald Binder,et al.  Incorporating pathway information into boosting estimation of high-dimensional risk prediction models , 2009, BMC Bioinformatics.

[14]  Jianqing Fan,et al.  Ultrahigh Dimensional Variable Selection: beyond the linear model , 2008, 0812.3201.

[15]  Yongdai Kim,et al.  Smoothly Clipped Absolute Deviation on High Dimensions , 2008 .

[16]  Yin Zhang,et al.  Fixed-Point Continuation for l1-Minimization: Methodology and Convergence , 2008, SIAM J. Optim..

[17]  H. Zou,et al.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models. , 2008, Annals of statistics.

[18]  R. Varadhan,et al.  Simple and Globally Convergent Methods for Accelerating the Convergence of Any EM Algorithm , 2008 .

[19]  Brent A. Johnson,et al.  Penalized Estimating Functions and Variable Selection in Semiparametric Regression Models , 2008, Journal of the American Statistical Association.

[20]  Alfred O. Hero,et al.  On EM algorithms and their proximal generalizations , 2008, 1201.5912.

[21]  Harald Binder,et al.  Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models , 2008, BMC Bioinformatics.

[22]  Mee Young Park,et al.  L1‐regularization path algorithm for generalized linear models , 2007 .

[23]  S. Rosset,et al.  Piecewise linear regularized solution paths , 2007, 0708.2197.

[24]  Cun-Hui Zhang PENALIZED LINEAR UNBIASED SELECTION , 2007 .

[25]  Jian Huang,et al.  Additive risk survival model with microarray data , 2007, BMC Bioinformatics.

[26]  Min Zhang,et al.  Bayesian profiling of molecular signatures to predict event times , 2007, Theoretical Biology and Medical Modelling.

[27]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[28]  Marina Vannucci,et al.  Bioinformatics Original Paper Bayesian Variable Selection for the Analysis of Microarray Data with Censored Outcomes , 2022 .

[29]  Romain Neugebauer,et al.  Cross-Validated Bagged Prediction of Survival , 2006, Statistical applications in genetics and molecular biology.

[30]  Ch. Roland,et al.  New iterative schemes for nonlinear fixed point problems, with applications to problems with bifurcations and incomplete-data problems , 2005 .

[31]  Jiang Gui,et al.  Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data , 2005, Bioinform..

[32]  Hongzhe Li,et al.  Boosting proportional hazards models using smoothing splines, with applications to high-dimensional microarray data , 2005, Bioinform..

[33]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[34]  Jiang Gui,et al.  Threshold Gradient Descent Method for Censored Data Regression with Applications in Pharmacogenomics , 2004, Pacific Symposium on Biocomputing.

[35]  Patrick L. Combettes,et al.  Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[36]  F. Vaida PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS , 2005 .

[37]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[38]  Paul Tseng,et al.  An Analysis of the EM Algorithm and Entropy-Like Proximal Point Methods , 2004, Math. Oper. Res..

[39]  Jiang Gui,et al.  Partial Cox regression analysis for high-dimensional microarray gene expression data , 2004, ISMB/ECCB.

[40]  佐伯 和子 The B cell-specific major raft protein, Raftlin, is necessary for the integrity of lipid raft and BCR signal transduction , 2004 .

[41]  Hiroyuki Honda,et al.  Multiple fuzzy neural network system for outcome prediction and classification of 220 lymphoma patients on the basis of molecular profiling , 2003, Cancer science.

[42]  I. Daubechies,et al.  An iterative thresholding algorithm for linear inverse problems with a sparsity constraint , 2003, math/0307152.

[43]  A. Yoshimura,et al.  The B cell‐specific major raft protein, Raftlin, is necessary for the integrity of lipid raft and BCR signal transduction , 2003, The EMBO journal.

[44]  Ash A. Alizadeh,et al.  Transformation of follicular lymphoma to diffuse large cell lymphoma is associated with a heterogeneous set of DNA copy number and gene expression alterations. , 2003, Blood.

[45]  Meland,et al.  The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. , 2002, The New England journal of medicine.

[46]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[47]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[48]  Mila Nikolova,et al.  Local Strong Homogeneity of a Regularized Estimator , 2000, SIAM J. Appl. Math..

[49]  K. Lange,et al.  EM algorithms without missing data , 1997, Statistical methods in medical research.

[50]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[51]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[52]  Y. Ritov,et al.  Monotone Estimating Equations for Censored Data , 1994 .

[53]  J. Hiriart-Urruty,et al.  Convex analysis and minimization algorithms , 1993 .

[54]  Niels Keiding,et al.  Statistical Models Based on Counting Processes , 1993 .

[55]  Donald Geman,et al.  Constrained Restoration and the Recovery of Discontinuities , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[56]  B. Lindsay,et al.  Monotonicity of quadratic-approximation algorithms , 1988 .

[57]  E. Polak On the mathematical foundations of nondifferentiable optimization in engineering design , 1987 .

[58]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[59]  R. Meyer A Comparison of the Forcing Function and Point-to-Set Mapping Approaches to Convergence Analysis , 1977 .

[60]  Robert R. Meyer,et al.  Sufficient Conditions for the Convergence of Monotonic Mathematical Programming Algorithms , 1976, J. Comput. Syst. Sci..

[61]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .

[62]  E. M. L. Beale,et al.  Nonlinear Programming: A Unified Approach. , 1970 .