Estimators of the Squared Cross-Validity Coefficient: A Monte Carlo Investigation

A monte carlo experiment was used to evaluate four procedures for estimating the population squared cross-validity of a sample least squares re gression equation. Four levels of population squared multiple correlation (Rp 2) and three levels of number of predictors (n) were factorially crossed to produce 12 population covariance matrices. Ran dom samples at four levels of sample size (N) were drawn from each population. The levels of N, n, and RP 2 were carefully selected to ensure relevance of simulation results for much applied research. The least squares regression equation from each isample was applied in its respective population to obtain the actual population squared cross-validity (Rcv 2). Estimates of Rcv 2 were computed using three formula estimators and the double cross-validation procedure. The results of the experiment demon strate that two estimators which have previously been advocated in the literature were negatively biased and exhibited poor accuracy. The negative bias for these two estimators increased as Rp 2 de creased and as the ratio of N to n decreased. As a consequence, their biases were most evident in small samples where cross-validation is imperative. In contrast, the third estimator was quite accurate and virtually unbiased within the scope of this simulation. This third estimator is recommended for applied settings which are adequately approxi mated by the correlation model.