Fast and robust discriminant analysis

The goal of discriminant analysis is to obtain rules that describe the separation between groups of observations. Moreover it allows to classify new observations into one of the known groups. In the classical approach discriminant rules are often based on the empirical mean and covariance matrix of the data, or of parts of the data. But because these estimates are highly influenced by outlying observations, they become inappropriate at contaminated data sets. Robust discriminant rules are obtained by inserting robust estimates of location and scatter into generalized maximum likelihood rules at normal distributions. This approach allows to discriminate between several populations, with equal or unequal covariance structure, and with equal or unequal membership probabilities. In particular, the highly robust MCD estimator is used as it can be computed very fast for large data sets. Also the probability of misclassification is estimated in a robust way. The performance of the new method is investigated through several simulations and by applying it to some real data sets.

[1]  M. R. Mickey,et al.  Estimation of Error Rates in Discriminant Analysis , 1968 .

[2]  Douglas M. Hawkins,et al.  High-Breakdown Linear Discriminant Analysis , 1997 .

[3]  D. Ruppert Computing S Estimators for Regression and Multivariate Location/Dispersion , 1992 .

[4]  P. J. Rousseeuw,et al.  Integrating a high-breakdown option into discriminant analysis in exploration geochemistry , 1992 .

[5]  P. Rousseeuw,et al.  Unmasking Multivariate Outliers and Leverage Points , 1990 .

[6]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[7]  Peter J. Rousseeuw,et al.  ROBUST REGRESSION BY MEANS OF S-ESTIMATORS , 1984 .

[8]  John Law,et al.  Robust Statistics—The Approach Based on Influence Functions , 1986 .

[9]  E. Ziegel,et al.  Proceedings in Computational Statistics , 1998 .

[10]  W. Fung,et al.  High Breakdown Estimation for Multiple Populations with Applications to Discriminant Analysis , 2000 .

[11]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[12]  M. Hubert,et al.  A fast method for robust principal components with applications to chemometrics , 2002 .

[13]  P. Rousseeuw,et al.  A fast algorithm for the minimum covariance determinant estimator , 1999 .

[14]  C. Croux,et al.  Robust linear discriminant analysis using S‐estimators , 2001 .

[15]  P. Rousseeuw Multivariate estimation with high breakdown point , 1985 .

[16]  J. Habbema,et al.  A stepwise discriminant analysis program using density estimetion , 1974 .

[17]  C. Croux,et al.  Influence Function and Efficiency of the Minimum Covariance Determinant Scatter Matrix Estimator , 1999 .

[18]  Theofanis Sapatinas,et al.  Discriminant Analysis and Statistical Pattern Recognition , 2005 .

[19]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .