Validation Strategies for Multiple Regression Analysis: Using the Coefficient of Determination

Multiple regression equations designed to explain or predict should be validated. This tutorial shows how recalculation of the coefficient of determination on hold-out sample data or new sample data can be used to improve regression equations and to test them for validity. The Herzberg equation is used as a criterion for acceptable shrinkage when the coefficient of determination is calculated on new data. Nevertheless, validation is an art rather than a science because elimination of unstable variables as well as different types of data splitting, use of new sample data, and adjustments for external differences when test samples are used from different time periods can lead to different decisions on whether the equations have been validated. Various strategies can be used to find effective validation techniques.

[1]  B. Efron,et al.  A Leisurely Look at the Bootstrap, the Jackknife, and , 1983 .

[2]  Cuthbert Daniel,et al.  Fitting Equations to Data: Computer Analysis of Multifactor Data , 1980 .

[3]  Marion G. Sobol GPA, GMAT, and SCALE: A method for quantification of admissions criteria , 1984 .

[4]  C. I. Mosier I. Problems and Designs of Cross-Validation 1 , 1951 .

[5]  D. Aaker,et al.  Chapter 2 – Marketing research , 2004 .

[6]  David M. Levine,et al.  Intermediate Statistical Methods and Applications: A Computer Package Approach , 1982 .

[7]  R. Snee,et al.  Ridge Regression in Practice , 1975 .

[8]  Ronald D. Snee,et al.  Validation of Regression Models: Methods and Examples , 1977 .

[9]  Richard F. Deckro,et al.  M.B.A. ADMISSION CRITERIA AND ACADEMIC SUCCESS , 1977 .

[10]  R. Wherry,et al.  A New Formula for Predicting the Shrinkage of the Coefficient of Multiple Correlation , 1931 .

[11]  P. Herzberg The Parameters of Cross-Validation , 1967 .

[12]  G. Chow Tests of equality between sets of coefficients in two linear regressions (econometrics voi 28 , 1960 .

[13]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[14]  Philip J. McCarthy,et al.  The Use of Balanced Half-Sample Replication in Cross-Validation Studies , 1976 .

[15]  S. Larson The shrinkage of the coefficient of multiple correlation. , 1931 .

[16]  Ronald G. Askin Multicollinearity in regression: Review and examples , 1982 .

[17]  P. Diaconis,et al.  Computer-Intensive Methods in Statistics , 1983 .

[18]  Neil J. Dorans,et al.  A note on cross-validating prediction equations , 1980 .

[19]  J. Stevens Applied Multivariate Statistics for the Social Sciences , 1986 .