The use of discriminant analysis, logistic regression and classification tree analysis in the development of classification models for human health effects

Abstract This paper describes the statistical techniques of discriminant analysis, logistic regression and classification tree (CT) analysis, which can be used to develop classification models (CMs) for predicting the membership of chemicals into two or more pre-defined groups, such as toxicological categories. One difference between the three methods is that discriminant analysis and logistic regression make a number of assumptions about the underlying data, whereas CT analysis is a non-parametric technique. Another difference is that discriminant analysis and logistic regression can be used to derive probabilities of group membership for individual chemicals, whereas CT analysis only produces average probabilities for the different groups. The application of the three techniques is illustrated by comparing the CMs obtained by applying them to an eye irritation data set.