Adaptive Reduced Rank Regression

Low rank regression has proven to be useful in a wide range of forecasting problems. However, in settings with a low signal-to-noise ratio, it is known to suffer from severe overfitting. This paper studies the reduced rank regression problem and presents algorithms with provable generalization guarantees. We use adaptive hard rank-thresholding in two different parts of the data analysis pipeline. First, we consider a low rank projection of the data to eliminate the components that are most likely to be noisy. Second, we perform a standard multivariate linear regression estimator on the data obtained in the first step, and subsequently consider a low-rank projection of the obtained regression matrix. Both thresholding is performed in a data-driven manner and is required to prevent severe overfitting as our lower bounds show. Experimental results show that our approach either outperforms or is competitive with existing baselines.

[1]  R. Cattell The Scree Test For The Number Of Factors. , 1966, Multivariate behavioral research.

[2]  David L. Donoho,et al.  The Optimal Hard Threshold for Singular Values is 4/sqrt(3) , 2013, 1305.5870.

[3]  Bryan T. Kelly,et al.  Empirical Asset Pricing Via Machine Learning , 2018, The Review of Financial Studies.

[4]  Pierpaolo Vivo,et al.  Universal correlations and power-law tails in financial covariance matrices , 2009, 0906.5249.

[5]  Barry Popkin,et al.  Using both principal component analysis and reduced rank regression to study dietary patterns and diabetes in Chinese adults , 2015, Public Health Nutrition.

[6]  Michael P. Clements,et al.  Dynamic Factor Models , 2011, Financial Econometrics.

[7]  Varun Kanade,et al.  From which world is your graph , 2017, NIPS.

[8]  G. Reinsel,et al.  Multivariate Reduced-Rank Regression: Theory and Applications , 1998 .

[9]  M. Rudelson,et al.  Non-asymptotic theory of random matrices: extreme singular values , 2010, 1003.2990.

[10]  Ali Ghodsi,et al.  Dimensionality Reduction A Short Tutorial , 2006 .

[11]  F. Götze,et al.  Rate of convergence in probability to the Marchenko-Pastur law , 2004 .

[12]  Chandler Davis The rotation of eigenvectors by a perturbation , 1963 .

[13]  Gregory Connor,et al.  The Three Types of Factor Models: A Comparison of Their Explanatory Power , 1995 .

[14]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[15]  Peter D. Hoff,et al.  Model Averaging and Dimension Selection for the Singular Value Decomposition , 2006, math/0609042.

[16]  Dinggang Shen,et al.  Low-Rank Graph-Regularized Structured Sparse Regression for Identifying Genetic Biomarkers , 2017, IEEE Transactions on Big Data.

[17]  M. Wegkamp,et al.  Optimal selection of reduced rank estimators of high-dimensional matrices , 2010, 1004.2995.

[18]  Ji Zhu,et al.  Reduced rank ridge regression and its kernel extensions , 2011, Stat. Anal. Data Min..

[19]  Kung-Sik Chan,et al.  Reduced rank regression via adaptive nuclear norm penalization. , 2012, Biometrika.

[20]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[21]  Martin J. Wainwright,et al.  Estimation of (near) low-rank matrices with noise and high-dimensional scaling , 2009, ICML.

[22]  Seong Eun Maeng,et al.  Random matrix theory and cross-correlations in global financial indices and local stock market indices , 2013 .

[23]  M. Schulze,et al.  A Dietary Pattern Derived by Reduced Rank Regression is Associated with Type 2 Diabetes in An Urban Ghanaian Population , 2015, Nutrients.

[24]  Jun Huang,et al.  Hyperspectral image denoising with superpixel segmentation and low-rank representation , 2017, Inf. Sci..

[25]  Haibo Li,et al.  Sparse Kernel Reduced-Rank Regression for Bimodal Emotion Recognition From Facial Expression and Speech , 2016, IEEE Transactions on Multimedia.

[26]  Dipak K. Dey,et al.  Bayesian sparse reduced rank multivariate regression , 2017, J. Multivar. Anal..

[27]  Jennifer Bender,et al.  Foundations of Factor Investing , 2013 .

[28]  Jianqing Fan,et al.  An Overview of the Estimation of Large Covariance and Precision Matrices , 2015, The Econometrics Journal.

[29]  Jean-Luc Prigent,et al.  Portfolio Optimization and Performance Analysis , 2007 .

[30]  E. Candès The restricted isometry property and its implications for compressed sensing , 2008 .

[31]  Marcelo C. Medeiros,et al.  Estimating High-Dimensional Time Series Models , 2012 .

[32]  Kristoffer Arnsfelt Hansen,et al.  Low Rank Approximation of Binary Matrices: Column Subset Selection and Generalizations , 2015, MFCS.

[33]  Tosio Kato Variation of discrete spectra , 1987 .

[34]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[35]  J. Stock,et al.  Forecasting Using Principal Components From a Large Number of Predictors , 2002 .

[36]  Andrew Ang Asset Management , 2014, Information Security Governance.

[37]  D. Donoho,et al.  Minimax risk of matrix denoising by singular value thresholding , 2013, 1304.2085.

[38]  Ji Zhu,et al.  Generalized linear models with low rank effects for network data , 2017 .

[39]  S. Wold Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models , 1978 .

[40]  William N. Goetzmann,et al.  Active Portfolio Management , 1999 .

[41]  David Bamman,et al.  Gender identity and lexical variation in social media , 2012, 1210.4567.

[42]  W. Kahan,et al.  The Rotation of Eigenvectors by a Perturbation. III , 1970 .

[43]  C. Priebe,et al.  Universally consistent vertex classification for latent positions graphs , 2012, 1212.1182.

[44]  Markus Pelger,et al.  Deep Learning in Asset Pricing , 2019, Manag. Sci..

[45]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[46]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[47]  B. Kelly,et al.  Characteristics Are Covariances: A Unified Model of Risk and Return , 2018, Journal of Financial Economics.

[48]  Yu-Hsien Peng On Singular Values of Random Matrices , 2015 .

[49]  Lijuan Cao,et al.  A comparison of PCA, KPCA and ICA for dimensionality reduction in support vector machine , 2003, Neurocomputing.

[50]  J. W. Silverstein,et al.  Spectral Analysis of Large Dimensional Random Matrices , 2009 .

[51]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[52]  Noah A. Smith,et al.  Predicting the NFL using Twitter , 2013, MLSA@PKDD/ECML.

[53]  S. B. Thompson,et al.  Cross-sectional forecasts of the equity premium , 2006 .

[54]  Raj Rao Nadakuditi,et al.  The singular values and vectors of low rank perturbations of large rectangular random matrices , 2011, J. Multivar. Anal..

[55]  Gaël Varoquaux,et al.  Multi-output predictions from neuroimaging: assessing reduced-rank linear models , 2017, 2017 International Workshop on Pattern Recognition in Neuroimaging (PRNI).

[56]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[57]  V. Koltchinskii,et al.  Nuclear norm penalization and optimal rates for noisy low rank matrix completion , 2010, 1011.6256.

[58]  Mojtaba Kohram,et al.  Spectral Regression with Low-Rank Approximation for Dynamic Graph Link Prediction , 2011, IEEE Intelligent Systems.

[59]  R. Oliveira Sums of random Hermitian matrices and an inequality by Rudelson , 2010, 1004.3821.