Unified Low-Rank Matrix Estimate via Penalized Matrix Least Squares Approximation

Low-rank matrix estimation arises in a number of statistical and machine learning tasks. In particular, the coefficient matrix is considered to have a low-rank structure in multivariate linear regression and multivariate quantile regression. In this paper, we propose a method called penalized matrix least squares approximation (PMLSA) toward a unified yet simple low-rank matrix estimate. Specifically, PMLSA can transform many different types of low-rank matrix estimation problems into their asymptotically equivalent least-squares forms, which can be efficiently solved by a popular matrix fast iterative shrinkage-thresholding algorithm. Furthermore, we derive analytic degrees of freedom for PMLSA, with which a Bayesian information criterion (BIC)-type criterion is developed to select the tuning parameters. The estimated rank based on the BIC-type criterion is verified to be asymptotically consistent with the true rank under mild conditions. Extensive experimental studies are performed to confirm our assertion.

[1]  Jianming Ye On Measuring and Correcting the Effects of Data Mining and Model Selection , 1998 .

[2]  P. Holland,et al.  Robust regression using iteratively reweighted least-squares , 1977 .

[3]  Konstantina Papagiannaki,et al.  Structural analysis of network traffic flows , 2004, SIGMETRICS '04/Performance '04.

[4]  Johan A. K. Suykens,et al.  Weighted least squares support vector machines: robustness and sparse approximation , 2002, Neurocomputing.

[5]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[6]  Zongben Xu,et al.  Folded-concave penalization approaches to tensor completion , 2015, Neurocomputing.

[7]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[8]  Y. She,et al.  Robust reduced-rank regression , 2015, Biometrika.

[9]  M. Wegkamp,et al.  Optimal selection of reduced rank estimators of high-dimensional matrices , 2010, 1004.2995.

[10]  Y. Selen,et al.  Model-order selection: a review of information criterion rules , 2004, IEEE Signal Processing Magazine.

[11]  Xuelong Li,et al.  Multivariate Multilinear Regression , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[13]  Stuart Barber,et al.  All of Statistics: a Concise Course in Statistical Inference , 2005 .

[14]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[15]  Moshe Buchinsky Estimating the asymptotic covariance matrix for quantile regression models a Monte Carlo study , 1995 .

[16]  Ding-Xuan Zhou,et al.  Distributed Kernel-Based Gradient Descent Algorithms , 2018 .

[17]  Johan A. K. Suykens,et al.  Robust Low-Rank Tensor Recovery With Regularized Redescending M-Estimator , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Gregory Piatetsky-Shapiro,et al.  High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[19]  Sumio Watanabe,et al.  A widely applicable Bayesian information criterion , 2012, J. Mach. Learn. Res..

[20]  Heng Lian,et al.  Robust reduced-rank modeling via rank regression , 2017 .

[21]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[22]  Qionghai Dai,et al.  Low-Rank Structure Learning via Nonconvex Heuristic Recovery , 2010, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Jiangjun Peng,et al.  Hyperspectral Image Restoration Via Total Variation Regularized Low-Rank Tensor Decomposition , 2017, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[24]  S. Yun,et al.  An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems , 2009 .

[25]  M. Yuan,et al.  Dimension reduction and coefficient estimation in multivariate linear regression , 2007 .

[26]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[27]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[28]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[29]  J. Zhu,et al.  On the degrees of freedom of reduced-rank estimators in multivariate regression. , 2012, Biometrika.

[30]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[31]  Xuelong Li,et al.  Robust Alternative Minimization for Matrix Completion , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[32]  E. Fama,et al.  Industry costs of equity , 1997 .

[33]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[34]  Tae-Hwan Kim,et al.  Modeling Autoregressive Conditional Skewness and Kurtosis with Multi-Quantile CAViaR , 2008, SSRN Electronic Journal.

[35]  Kung-Sik Chan,et al.  Reduced rank regression via adaptive nuclear norm penalization. , 2012, Biometrika.

[36]  Enrique Herrera-Viedma,et al.  Clustering of web search results based on the cuckoo search algorithm and Balanced Bayesian Information Criterion , 2014, Inf. Sci..

[37]  John Wright,et al.  Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization , 2009, NIPS.

[38]  P. Bühlmann,et al.  Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana , 2004, Genome Biology.

[39]  Bhaskar D. Rao,et al.  Sparse signal reconstruction from limited data using FOCUSS: a re-weighted minimum norm algorithm , 1997, IEEE Trans. Signal Process..

[40]  Geoffrey J. Gordon,et al.  Relational learning via collective matrix factorization , 2008, KDD.

[41]  J. Schmee An Introduction to Multivariate Statistical Analysis , 1986 .

[42]  Bin Yu,et al.  Asymptotic Properties of Lasso+mLS and Lasso+Ridge in Sparse High-dimensional Linear Regression , 2013, 1306.5505.

[43]  Volker Roth,et al.  The generalized LASSO , 2004, IEEE Transactions on Neural Networks.

[44]  I. Daubechies,et al.  Iteratively reweighted least squares minimization for sparse recovery , 2008, 0807.0575.

[45]  Chenlei Leng,et al.  Unified LASSO Estimation by Least Squares Approximation , 2007 .

[46]  Babak Nadjar Araabi,et al.  Improved Bayesian information criterion for mixture model selection , 2016, Pattern Recognit. Lett..