Graphical Detection of Regression Outliers and Mixtures

Regressions in practice can include outliers and other unknown subpopulation structure. For example, mixtures of regressions occur if there is an omitted categorical predictor like gender, species or location and di erent regressions occur within each category. A lurking variable that has an important e ect but is not present among the predictors under consideration (Box 1966) can seriously complicate a regression analyses. Regression structure with lurking variables is illustrated in Figure 1a which is a stylized representation of subpopulation structures in a regression with response Y predictors Xk. The contours A, C and E represent di erent subpopulation regressions. Point B represents an isolated outlier while the circular contours D represent an outlying cluster. The regression illustrated in the gure consists of a mixture of ve distinct regressions, one for each of the four subpopulations and one for the isolated outlier.