Classification Tree Prediction Models for Dental Caries from Clinical, Microbiological, and Interview Data

Caries prediction by Classification And Regression Tree (CART) analysis is an appropriate and powerful alternative or complement to the commonly used classification methods of logistic regression and discriminant analysis, both parametric and nonparametric. The binary classification tree method discussed in this article is designed for complex data and does not require assumptions about the predictor variables or about the presence or absence of interactions among the predictor variables. Furthermore, the results give insight into the structures and interactions in the data and are easy to interpret and apply. In preliminary applications of the CART algorithms to data from The University of North Carolina Caries Risk Assessment Study, the method produced prediction rules having sensitivities and specificities that were similar to or slightly better than those associated with logistic and discriminant analyses. The classification trees constructed tended to involve far fewer predictor variables than required for adequate logistic and discriminant models. For example, for first-grade children in Aiken, South Carolina, nine variables were used to define a prediction rule having 64% sensitivity and 86% specificity. Ten-fold cross-validation estimates for future data were 58% and 79%, respectively. For first-grade children in Portland, Maine, two variables were used to define a prediction rule having 62% sensitivity and 77% specificity. The cross-validation estimates for future data were 58% and 78%, respectively. A brief, and previously unavailable, explanation of the CART method is given for the special case of a dichotomous outcome variable.