SELECTING THE “BEST” REGRESSION WHEN FACED WITH MISSING OBSERVATIONS

The sample multiple correlation coefficient is often used to select a subset of independent variables that “best” predicts a dependent variable, Y. If the data are partially missing, the choice of best predictors often should reflect not only how correlated the predictors are with Y but also how likely they are to be observed. Thus, an independent variable that is highly correlated with Y but also is difficult to record (i.e., is often missing) may not be as useful a predictor of Y as a less correlated but easily recorded independent variable. A generalization of the multiple correlation coefficient is defined which is appropriate when there are missing values but is identical to the multiple correlation coefficient when there are no missing values. An example of its use is presented.