Convex optimization methods for dimension reduction and coefficient estimation in multivariate linear regression

In this paper, we study convex optimization methods for computing the nuclear (or, trace) norm regularized least squares estimate in multivariate linear regression. The so-called factor estimation and selection method, recently proposed by Yuan et al. (J Royal Stat Soc Ser B (Statistical Methodology) 69(3):329–346, 2007) conducts parameter estimation and factor selection simultaneously and have been shown to enjoy nice properties in both large and finite samples. To compute the estimates, however, can be very challenging in practice because of the high dimensionality and the nuclear norm constraint. In this paper, we explore a variant due to Tseng of Nesterov’s smooth method and interior point methods for computing the penalized least squares estimate. The performance of these methods is then compared using a set of randomly generated instances. We show that the variant of Nesterov’s smooth method generally outperforms the interior point method implemented in SDPT3 version 4.0 (beta) (Toh et al. On the implementation and usage of sdpt3—a matlab software package for semidefinite-quadratic-linear programming, version 4.0. Manuscript, Department of Mathematics, National University of Singapore (2006)) substantially. Moreover, the former method is much more memory efficient.

[1]  H. Hotelling The most predictable criterion. , 1935 .

[2]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[3]  T. W. Anderson Estimating Linear Restrictions on Regression Coefficients for Multivariate Normal Distributions , 1951 .

[4]  W. Massy Principal Components Regression in Exploratory Statistical Research , 1965 .

[5]  A. Izenman Reduced-rank regression for the multivariate linear model , 1975 .

[6]  H. Wold Soft Modelling by Latent Variables: The Non-Linear Iterative Partial Least Squares (NIPALS) Approach , 1975, Journal of Applied Probability.

[7]  Y. Nesterov A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .

[8]  J. Schmee An Introduction to Multivariate Statistical Analysis , 1986 .

[9]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[10]  J. Hiriart-Urruty,et al.  Convex analysis and minimization algorithms , 1993 .

[11]  T. Hastie,et al.  [A Statistical View of Some Chemometrics Regression Tools]: Discussion , 1993 .

[12]  R. Brooks,et al.  Joint Continuum Regression for Multiple Predictands , 1994 .

[13]  Chih-Ling Tsai,et al.  MODEL SELECTION FOR MULTIVARIATE REGRESSION IN SMALL SAMPLES , 1994 .

[14]  Paul H. C. Eilers,et al.  Flexible smoothing with B-splines and penalties , 1996 .

[15]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[16]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[17]  Trevor Hastie,et al.  Predicting multivariate responses in multiple linear regression - Discussion , 1997 .

[18]  Y. Fujikoshi,et al.  Modified AIC and Cp in multivariate linear regression , 1997 .

[19]  T. Fearn,et al.  Multivariate Bayesian variable selection and prediction , 1998 .

[20]  G. Reinsel,et al.  Multivariate Reduced-Rank Regression: Theory and Applications , 1998 .

[21]  G. Reinsel Elements of Multivariate Time Series Analysis, 2nd Edition , 1998 .

[22]  T. Fearn,et al.  The choice of variables in multivariate regression: a non-conjugate Bayesian decision theory approach , 1999 .

[23]  Sergey Bakin,et al.  Adaptive regression and model selection in data mining problems , 1999 .

[24]  David Ruppert,et al.  Penalized regression splines , 1999 .

[25]  Arkadi Nemirovski,et al.  Lectures on modern convex optimization - analysis, algorithms, and engineering applications , 2001, MPS-SIAM series on optimization.

[26]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[27]  Stephen P. Boyd,et al.  A rank minimization heuristic with application to minimum order system approximation , 2001, Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148).

[28]  T. Fearn,et al.  Bayes model averaging with selection of regressors , 2002 .

[29]  Kim-Chuan Toh,et al.  Solving semidefinite-quadratic-linear programs using SDPT3 , 2003, Math. Program..

[30]  Stephen J. Wright,et al.  Simultaneous Variable Selection , 2005, Technometrics.

[31]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[32]  B. Peter,et al.  BOOSTING FOR HIGH-MULTIVARIATE RESPONSES IN HIGH-DIMENSIONAL LINEAR REGRESSION , 2006 .

[33]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[34]  E. Berg,et al.  In Pursuit of a Root , 2007 .

[35]  M. Yuan,et al.  Dimension reduction and coefficient estimation in multivariate linear regression , 2007 .

[36]  Gene H. Golub,et al.  Generalized cross-validation as a method for choosing a good ridge parameter , 1979, Milestones in Matrix Computation.

[37]  Zhaosong Lu Smooth Optimization Approach for Covariance Selection ∗ , 2007 .

[38]  Francis R. Bach,et al.  Consistency of trace norm minimization , 2007, J. Mach. Learn. Res..

[39]  Zhaosong Lu,et al.  Smooth Optimization Approach for Sparse Covariance Selection , 2008, SIAM J. Optim..

[40]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[41]  Kim-Chuan Toh,et al.  On the Implementation and Usage of SDPT3 – A Matlab Software Package for Semidefinite-Quadratic-Linear Programming, Version 4.0 , 2012 .

[42]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[43]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .