Prediction in latent factor regression: Adaptive PCR and beyond

This work is devoted to the finite sample prediction risk analysis of a class of linear predictors of a response $Y\in \mathbb{R}$ from a high-dimensional random vector $X\in \mathbb{R}^p$ when $(X,Y)$ follows a latent factor regression model generated by a unobservable latent vector $Z$ of dimension less than $p$. Our primary contribution is in establishing finite sample risk bounds for prediction with the ubiquitous Principal Component Regression (PCR) method, under the factor regression model, with the number of principal components adaptively selected from the data---a form of theoretical guarantee that is surprisingly lacking from the PCR literature. To accomplish this, we prove a master theorem that establishes a risk bound for a large class of predictors, including the PCR predictor as a special case. This approach has the benefit of providing a unified framework for the analysis of a wide range of linear prediction methods, under the factor regression setting. In particular, we use our main theorem to recover known risk bounds for the minimum-norm interpolating predictor, which has received renewed attention in the past two years, and a prediction method tailored to a subclass of factor regression models with identifiable parameters. This model-tailored method can be interpreted as prediction via clusters with latent centers. To address the problem of selecting among a set of candidate predictors, we analyze a simple model selection procedure based on data-splitting, providing an oracle inequality under the factor model to prove that the performance of the selected predictor is close to the optimal candidate. We conclude with a detailed simulation study to support and complement our theoretical results.

[1]  M. Wegkamp,et al.  Inference in latent factor regression with clusterable features , 2022, Bernoulli.

[2]  Andrea Montanari,et al.  Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.

[3]  Mikhail Belkin,et al.  Classification vs regression in overparameterized regimes: Does the loss function matter? , 2020, J. Mach. Learn. Res..

[4]  Florentina Bunea,et al.  Interpolation under latent factor regression models , 2020, ArXiv.

[5]  Philip M. Long,et al.  Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.

[6]  Vitaly Feldman,et al.  Does learning require memorization? a short tale about a long tail , 2019, STOC.

[7]  Mikhail Belkin,et al.  Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..

[8]  M. Wegkamp,et al.  Adaptive estimation in structured factor models with applications to overlapping clustering , 2017, The Annals of Statistics.

[9]  A. Montanari,et al.  The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime , 2019 .

[10]  Mikhail Belkin,et al.  Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.

[11]  M. Wegkamp,et al.  Inference in Interpretable Latent Factor Regression Models , 2019 .

[12]  Anant Sahai,et al.  Harmless interpolation of noisy data in regression , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[13]  Mikhail Belkin,et al.  Does data interpolation contradict statistical optimality? , 2018, AISTATS.

[14]  Tengyuan Liang,et al.  Just Interpolate: Kernel "Ridgeless" Regression Can Generalize , 2018, The Annals of Statistics.

[15]  Mikhail Belkin,et al.  Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate , 2018, NeurIPS.

[16]  Mikhail Belkin,et al.  To understand deep learning we need to understand kernel learning , 2018, ICML.

[17]  M. Wegkamp,et al.  Adaptive estimation of the rank of the coefficient matrix in high-dimensional multivariate response regression models , 2017, The Annals of Statistics.

[18]  Bryan T. Kelly,et al.  The Three-Pass Regression Filter: A New Approach to Forecasting Using Many Predictors , 2014 .

[19]  Seung C. Ahn,et al.  Eigenvalue Ratio Test for the Number of Factors , 2013 .

[20]  Jianqing Fan,et al.  Large covariance estimation by thresholding principal orthogonal complements , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[21]  F. Bunea,et al.  On the sample covariance matrix estimator of reduced effective rank population matrices, with applications to fPCA , 2012, 1212.5321.

[22]  Clifford Lam,et al.  Factor modeling for high-dimensional time series: inference for the number of factors , 2012, 1206.0613.

[23]  Sham M. Kakade,et al.  Random Design Analysis of Ridge Regression , 2012, COLT.

[24]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[25]  J. Bai,et al.  Forecasting economic time series using targeted predictors , 2008 .

[26]  J. Bai,et al.  Confidence Intervals for Diffusion Index Forecasts and Inference for Factor-Augmented Regressions , 2006 .

[27]  R. Tibshirani,et al.  Prediction by Supervised Principal Components , 2006 .

[28]  J. Bai,et al.  Inferential Theory for Factor Models of Large Dimensions , 2003 .

[29]  J. Stock,et al.  Forecasting Using Principal Components From a Large Number of Predictors , 2002 .

[30]  J. Bai,et al.  Determining the Number of Factors in Approximate Factor Models , 2000 .

[31]  Yuhong Yang MODEL SELECTION FOR NONPARAMETRIC REGRESSION , 1997 .

[32]  H. Hotelling The relations of the newer multivariate statistical methods to factor analysis. , 1957 .

[33]  M. Kendall A course in multivariate analysis , 1958 .