Methodology Review: Estimation of Population Validity and Cross-Validity, and the Use of Equal Weights in Prediction

In multiple regression, optimal linear weights are obtained using an ordinary least squares (OLS) procedure. However, these linear weighted combinations of predictors may not optimally predict the same criterion in the population from which the sample was drawn (population validity) or other samples drawn from the same population (population cross-validity). To achieve more accurate estimates of population validity and population cross-validity, some researchers and practitioners use formulas or traditional empirical methods to obtain the estimates. Others have suggested using the equal weights procedure as an alternative to the formula-based and empirical procedures. This review found that formula-based procedures can be used in place of empirical validation for estimating population validity or in place of empirical cross-validation for estimating population cross-validity. The equal weights procedure is a viable alternative when the observed multiple correlation is low to moderate and the variability among predictor-criterion correlations is low. Despite these findings, it is difficult to recommend one formula-based estimate over another because no single study has included all of the currently available formulas. Suggestions are offered for future research and application of these techniques.

[1]  C. I. Mosier I. Problems and Designs of Cross-Validation 1 , 1951 .

[2]  W. R. Buckland,et al.  Contributions to Probability and Statistics , 1960 .

[3]  Nambury S. Raju,et al.  A Comparison of Five Methods for Combining Multiple Criteria into a Single Composite , 1982 .

[4]  R. Wherry,et al.  A New Formula for Predicting the Shrinkage of the Coefficient of Multiple Correlation , 1931 .

[5]  Paul Horst,et al.  Psychological measurement and prediction , 1966 .

[6]  S. Larson The shrinkage of the coefficient of multiple correlation. , 1931 .

[7]  R. Darlington Estimating the True Accuracy of Regression Predictions. , 1996 .

[8]  Frank L. Schmidt,et al.  The Relative Efficiency of Regression and Simple Unit Predictor Weights in Applied Differential Psychology , 1971 .

[9]  R. Darlington,et al.  Regression and Linear Models , 1990 .

[10]  Multicrossvalidation and the Jackknife in the Estimation of Shrinkage of the Multiple Coefficient of Correlation , 1985 .

[11]  M. Ezekiel The Application of the Theory of Error to Multiple and Curvilinear Correlation , 1929 .

[12]  R. Wherry Underprediction from Overfitting: 45 years of Shrinkage. , 1975 .

[13]  Philippe Cattin,et al.  Estimation of the predictive power of a regression model. , 1980 .

[14]  Philippe Cattin A predictive-validity-based procedure for choosing between regression and equal weights. , 1978 .

[15]  Carl J. Huberty,et al.  Estimation in Multiple Correlation/Prediction , 1980 .

[16]  Ingram Olkin,et al.  Unbiased Estimation of Certain Correlation Coefficients , 1958 .

[17]  Neil J. Dorans,et al.  Estimators of the Squared Cross-Validity Coefficient: A Monte Carlo Investigation , 1979 .

[18]  Robert M. Pruzek,et al.  Weighting predictors in linear models: Alternatives to least squares and limitations of equal weights. , 1978 .

[19]  F. Drasgow,et al.  Alternative weighting schemes for linear prediction , 1978 .

[20]  W W Rozeboom,et al.  Estimation of cross-validated multiple correlation: a clarification. , 1978, Psychological bulletin.

[21]  W. Hays Statistics for the social sciences , 1973 .

[22]  D. Krus,et al.  Computer Assisted Multicrossvalidation in Regression Analysis , 1982 .

[23]  R J Wherry,et al.  Generating multiple samples of multivariate data with arbitrary population parameters , 1965, Psychometrika.

[24]  F. Lord EFFICIENCY OF PREDICTION WHEN A REGRESSION EQUATION FROM ONE SAMPLE IS USED IN A NEW SAMPLE , 1950 .

[25]  E. Kennedy Estimation of the Squared Cross-Validity Coefficient in the Context of Best Subset Regression , 1988 .

[26]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[27]  J. Elashoff,et al.  Multiple Regression in Behavioral Research. , 1975 .

[28]  Howard Wainer,et al.  Estimating Coefficients in Linear Models: It Don't Make No Nevermind , 1976 .

[29]  Marvin H. Trattner COMPARISON OF THREE METHODS FOR ASSEMBLING APTITUDE TEST BATTERIES , 1963 .

[30]  R. Hogarth,et al.  Unit weighting schemes for decision making , 1975 .

[31]  C. Huberty Teacher’s Corner: A Note on Interpreting an R 2 Value , 1994 .

[32]  John G. Claudy Multiple Regression and Validity Estimation in One Sample , 1978 .

[33]  Michael W. Browne,et al.  PREDICTIVE VALIDITY OF A LINEAR REGRESSION EQUATION , 1975 .

[34]  R. B. Darlington Reduced-variance regression. , 1978, Psychological bulletin.

[35]  M. H. Quenouille Approximate Tests of Correlation in Time‐Series , 1949 .

[36]  R. Dawes Judgment under uncertainty: The robust beauty of improper linear models in decision making , 1979 .

[37]  R. Klimoski,et al.  Estimating the validity of cross-validity estimation , 1986 .

[38]  Y. Yung Comments on Huberty’s Test of the Squared Multiple Correlation Coefficient , 1996 .

[39]  N. Uhl,et al.  Predicting Shrinkage in the Multiple Correlation Coefficient , 1970 .

[40]  George R. Burket,et al.  A study of reduced rank models for multiple prediction , 1943 .

[41]  John G. Claudy A Comparison of Five Variable Weighting Procedures , 1972 .

[42]  Sources of imprecision in formula cross-validated multiple correlations , 1990 .

[43]  Michael W. Browne A COMPARISON OF SINGLE SAMPLE AND CROSS‐VALIDATION METHODS FOR ESTIMATING THE MEAN SQUARED ERROR OF PREDICTION IN MULTIPLE LINEAR REGRESSION , 1975 .

[44]  Neal Schmitt,et al.  A Monte Carlo evaluation of three formula estimates of cross-validated multiple correlation. , 1977 .

[45]  R. Dawes,et al.  Linear models in decision making. , 1974 .

[46]  Jeffrey D. Kromrey,et al.  Use of Empirical Estimates of Shrinkage in Multiple Regression: A Caution , 1995 .

[47]  R. Darlington,et al.  Multiple regression in psychological research and practice. , 1968, Psychological bulletin.

[48]  Wayne F. Cascio,et al.  Applied psychology in personnel management , 1978 .

[49]  David S. Carter Comparison of Different Shrinkage Formulas in Estimating Population Multiple Correlation Coefficients , 1979 .

[50]  A. Wesman,et al.  Multiple Regression vs. Simple Addition of Scores in Prediction of College Grades , 1959 .

[51]  John B. Carroll Phillip Justin Rulon (1900–1968) , 1969 .

[52]  Philippe Cattin Note on the estimation of the squared cross-validated multiple correlation of a regression model. , 1980 .

[53]  Howard Wainer,et al.  On the sensitivity of regression and regressors. , 1978 .

[54]  Kevin R. Murphy Cost-Benefit Considerations in Choosing among Cross-Validation Methods. , 1984 .

[55]  James E. Laughlin,et al.  Comment on "Estimating coefficients in linear models: It don't make no nevermind." , 1978 .