Estimating Random-X Prediction Error of Regression Models

The areas of model selection and model evaluation for predictive modeling have received extensive treatment in the statistics literature, leading to both theoretical advances and practical methods based on covariance penalties and other approaches. However, the majority of this work, and especially the practical approaches, are based on the "Fixed-X assumption", where covariate values are assumed to be non-random and known. By contrast, in most modern predictive modeling applications, it is more reasonable to take the "Random-X" view, where future prediction points are random and new. In this paper we concentrate on examining the applicability of the covariance-penalty approaches to this problem. We propose a decomposition of the Random-X prediction error that clarifies the additional error due to Random-X, which is present in both the variance and bias components of the error. This decomposition is general, but we focus on its application to the fundamental case of least squares regression. We show how to quantify the excess variance under some assumptions using standard random-matrix results, leading to a covariance penalty approach we term $RCp$. When the variance of the error is unknown, using the standard unbiased estimate leads to an approach we term $\hat{RCp}$, which is closely related to existing methods MSEP and GCV. To account for excess bias, we propose to take only the bias component of the ordinary cross validation (OCV) estimate, resulting in a hybrid penalty we term $RCp^+$. We demonstrate by theoretical analysis and simulations that this approach is consistently superior to OCV, although the difference is typically small.

[1]  Stefan Wager,et al.  High-Dimensional Asymptotics of Prediction: Ridge Regression and Classification , 2015, 1507.03003.

[2]  A. Buja,et al.  Models as Approximations, Part I: A Conspiracy of Nonlinearity and Random Regressors in Linear Regression , 2014, 1404.1578.

[3]  Sourav Chatterjee,et al.  Assumptionless consistency of the Lasso , 2013, 1303.5817.

[4]  Lee H. Dicker,et al.  Optimal equivariant prediction for high-dimensional linear models with arbitrary predictor covariance , 2013 .

[5]  R. Tibshirani The Lasso Problem and Uniqueness , 2012, 1206.0313.

[6]  R. Tibshirani,et al.  Degrees of freedom in lasso problems , 2011, 1111.0653.

[7]  Sham M. Kakade,et al.  Random Design Analysis of Ridge Regression , 2012, COLT.

[8]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[9]  Noureddine El Karoui,et al.  Concentration of measure and spectra of random matrices: Applications to correlation matrices, elliptical distributions and beyond , 2009, 0912.1950.

[10]  H. Leeb Evaluation and selection of models for out-of-sample prediction when the sample size is small relative to the complexity of the data-generating process , 2008, 0802.3364.

[11]  Gene H. Golub,et al.  Generalized cross-validation as a method for choosing a good ridge parameter , 1979, Milestones in Matrix Computation.

[12]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[13]  B. Efron The Estimation of Prediction Error , 2004 .

[14]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[15]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[16]  S. Geer Empirical Processes in M-Estimation , 2000 .

[17]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[18]  C. Mallows More comments on C p , 1995 .

[19]  L. Breiman,et al.  Submodel selection and evaluation in regression. The X-random case , 1992 .

[20]  G. Wahba Spline Models for Observational Data , 1990 .

[21]  P. Burman A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods , 1989 .

[22]  B. Efron How Biased is the Apparent Error Rate of a Prediction Rule , 1986 .

[23]  D. Freedman,et al.  How Many Variables Should Be Entered in a Regression Equation , 1983 .

[24]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[25]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[26]  M. Thompson Selection of Variables in Multiple Regression: Part II. Chosen Procedures, Computations and Examples , 1978 .

[27]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[28]  F. J. Anscombe,et al.  Topics in the Investigation of Linear Relations Fitted by the Method of Least Squares , 1967 .