On the use of cross-validation to assess performance in multivariate prediction

We describe a Monte Carlo investigation of a number of variants of cross-validation for the assessment of performance of predictive models, including different values of k in leave-k-out cross-validation, and implementation either in a one-deep or a two-deep fashion. We assume an underlying linear model that is being fitted using either ridge regression or partial least squares, and vary a number of design factors such as sample size n relative to number of variables p, and error variance. The investigation encompasses both the non-singular (i.e. n > p) and the singular (i.e. n ≤ p) cases. The latter is now common in areas such as chemometrics but has as yet received little rigorous investigation. Results of the experiments enable us to reach some definite conclusions and to make some practical recommendations.

[1]  Michael Thompson,et al.  The efficient cross-validation of principal components applied to principal component regression , 1995 .

[2]  S. Wold Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models , 1978 .

[3]  C. Braak Discussion to 'Predicting multivariate responses in multiple linear regression' by L. Breiman & J.H. Friedman , 1997 .

[4]  N. Altman,et al.  On the Optimality of Prediction‐based Selection Criteria and the Convergence Rates of Estimators , 1997 .

[5]  S. Wold,et al.  A PLS kernel algorithm for data sets with many variables and few objects. Part II: Cross‐validation, missing data and examples , 1995 .

[6]  J. Brian Gray,et al.  Introduction to Linear Regression Analysis , 2002, Technometrics.

[7]  L. Gleser Measurement, Regression, and Calibration , 1996 .

[8]  W. V. McCarthy,et al.  Discriminant Analysis with Singular Covariance Matrices: Methods and Applications to Spectroscopic Data , 1995 .

[9]  Wojtek J. Krzanowski,et al.  Selection of variables, and assessment of their performance, in mixed-variable discriminant analysis , 1995 .

[10]  M. Stone,et al.  Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[11]  G. Golub,et al.  Good Ridge Parameter , 1979 .

[12]  J. Friedman,et al.  Predicting Multivariate Responses in Multiple Linear Regression , 1997 .

[13]  Anthony B. Atkinson,et al.  3. Measurement, Regression, and Calibration , 1995 .

[14]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[15]  B. Efron,et al.  The Jackknife: The Bootstrap and Other Resampling Plans. , 1983 .

[16]  P. Garthwaite An Interpretation of Partial Least Squares , 1994 .

[17]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[18]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[19]  Toshio Odanaka,et al.  ADAPTIVE CONTROL PROCESSES , 1990 .

[20]  M. Hills Allocation Rules and Their Error Rates , 1966 .

[21]  Wojtek J. Krzanowski,et al.  ON SELECTING VARIABLES AND ASSESSING THEIR PERFORMANCE IN LINEAR DISCRIMINANT ANALYSIS , 1989 .

[22]  R. Bellman,et al.  V. Adaptive Control Processes , 1964 .

[23]  M. R. Mickey,et al.  Estimation of Error Rates in Discriminant Analysis , 1968 .

[24]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .