Projection-pursuit approach to robust linear discriminant analysis

Discriminant analysis plays an important role in multivariate statistics as a prediction and classification method. It has been successfully applied in many fields of work and research. As it happens with other multivariate methods, discriminant analysis is highly vulnerable to the presence of outliers that commonly occur in many real world data sets. The lack of robustness of the classical estimators on which the linear discriminant function is based is a severe disadvantage and several authors have worked to find efficient ways to prevent the damage that outliers can cause. This paper focuses on the projection-pursuit approach to discriminant analysis. The projection-pursuit estimators are described and theoretical properties are deduced and their relevance is highlighted. These include Fisher consistency, affine equivariance, partial influence functions and asymptotic distributions. Application to real data and a simulation study reveal the robustness of the projection-pursuit approach. In both analyses the data relates to a large number of variables, a situation that is becoming common when new technology is applied to data gathering.

[1]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[2]  Peter Filzmoser,et al.  Multiple group linear discriminant analysis: robustness and error rate , 2006 .

[3]  Christophe Croux,et al.  High breakdown estimators for principal components: the projection-pursuit approach revisited , 2005 .

[4]  Peter J. Rousseeuw,et al.  ROBUST REGRESSION BY MEANS OF S-ESTIMATORS , 1984 .

[5]  J. A. Branco,et al.  Partial influence functions , 2002 .

[6]  Beata Walczak,et al.  Comprehensive Chemometrics: Set: Chemical and Biochemical Data Analysis , 2009 .

[7]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[8]  M. Hubert,et al.  High-Breakdown Robust Multivariate Methods , 2008, 0808.0657.

[9]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[10]  N. Campbell Robust Procedures in Multivariate Analysis II. Robust Canonical Variate Analysis , 1982 .

[11]  R. Muirhead,et al.  A comparison of robust linear discriminant procedures using projection pursuit methods , 1994 .

[12]  K. Joossens,et al.  Empirical Comparison of the Classification Performance of Robust Linear and Quadratic Discriminant Analysis , 2004 .

[13]  R. Randles,et al.  Generalized Linear and Quadratic Discriminant Functions Using Robust Estimates , 1978 .

[14]  N. Balakrishnan,et al.  Robust multivariate classification procedures based on the mml estimators , 1984 .

[15]  W. W. Muir,et al.  Data, models, and statistical analysis , 1983 .

[16]  F. Hampel The Influence Curve and Its Role in Robust Estimation , 1974 .

[17]  Ruben H. Zamar,et al.  Robust Estimates of Location and Dispersion for High-Dimensional Datasets , 2002, Technometrics.

[18]  Peter A. Lachenbruch,et al.  Robustness of the linear and quadratic discriminant function to certain types of non‐normality , 1973 .

[19]  Svante Wold,et al.  Pattern recognition by means of disjoint principal components models , 1976, Pattern Recognit..

[20]  B. L. Welch ii) Note on Discriminant Functions , 1939 .

[21]  Robert F. Ling,et al.  Classification and Clustering. , 1979 .

[22]  P. Rousseeuw Multivariate estimation with high breakdown point , 1985 .

[23]  Mia Hubert,et al.  Fast and robust discriminant analysis , 2004, Comput. Stat. Data Anal..

[24]  W. Fung,et al.  High Breakdown Estimation for Multiple Populations with Applications to Discriminant Analysis , 2000 .

[25]  M. Hubert,et al.  Robust classification in high dimensions based on the SIMCA Method , 2005 .

[26]  Christophe Croux,et al.  Influence of observations on the misclassification probability in quadratic discriminant analysis , 2005 .

[27]  Peter A. Lachenbruch,et al.  The effect of huberizing and trimming on the Quadratic-discriminant function , 1980 .

[28]  Christian Posse,et al.  Projection pursuit discriminant analysis for two groups , 1992 .

[29]  Peter Filzmoser,et al.  Robust Multivariate Methods in Chemometrics , 2020, Comprehensive Chemometrics.

[30]  Beata Walczak,et al.  Robust SIMCA-bounding influence of outliers , 2007 .

[31]  N. M. Neykov,et al.  Robust two-group discrimination by bounded influence regression: a Monte Carlo simulation , 1994 .

[32]  Georg Ch. Pflug,et al.  Mathematical statistics and applications , 1985 .

[33]  V. Moulin,et al.  Abstract , 2004, Veterinary Record.

[34]  E. Ziegel COMPSTAT: Proceedings in Computational Statistics , 1988 .

[35]  A. Wald Contributions to the Theory of Statistical Estimation and Testing Hypotheses , 1939 .

[36]  Alfredo Rizzi,et al.  The chi-square test when the expected frequencies are less than 5 , 2006 .

[37]  Christophe Croux,et al.  A Fast Algorithm for Robust Principal Components Based on Projection Pursuit , 1996 .

[38]  Douglas M. Hawkins,et al.  High-Breakdown Linear Discriminant Analysis , 1997 .

[39]  Andrew Beng Jin Teoh,et al.  Random Projection with Robust Linear Discriminant Analysis Model in Face Recognition , 2007, Computer Graphics, Imaging and Visualisation (CGIV 2007).

[40]  Mia Hubert,et al.  ROBPCA: A New Approach to Robust Principal Component Analysis , 2005, Technometrics.

[41]  W. B. Stern,et al.  X‐RAY FLUORESCENCE ANALYSIS OF ARCHAIC GREEK POTTERY , 1977 .

[42]  P. J. Rousseeuw,et al.  Integrating a high-breakdown option into discriminant analysis in exploration geochemistry , 1992 .

[43]  Werner A. Stahel,et al.  New directions in statistical data analysis and robustness. Proceedings of the Workshop on Data Analysis and Robustness held in Ascona, 1992 , 1994 .

[44]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[45]  P. Rousseeuw,et al.  A fast algorithm for the minimum covariance determinant estimator , 1999 .

[46]  C. Croux,et al.  Robust linear discriminant analysis using S‐estimators , 2001 .

[47]  P. Lachenbruch,et al.  Discriminant Analysis When Scale Contamination Is Present in the Initial Sample , 1977 .

[48]  Peter A. Lachenbruch,et al.  How non-normality affects the quadratic discriminant function , 1979 .

[49]  David M. Rocke Robustness properties of S-estimators of multivariate location and shape in high dimension , 1996 .

[50]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[51]  J. V. Ness,et al.  Robust discriminant analysis: Training data breakdown point , 1998 .

[52]  P. Rousseeuw,et al.  Developments in Robust Statistics , 2003 .

[53]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[54]  Theofanis Sapatinas,et al.  Discriminant Analysis and Statistical Pattern Recognition , 2005 .

[55]  Stefan Van Aelst,et al.  Theory and applications of recent robust methods , 2004 .

[56]  M. Daumer,et al.  Evaluating Microarray-based Classifiers: An Overview , 2008, Cancer informatics.

[57]  Ping Xu,et al.  Modified linear discriminant analysis approaches for classification of high-dimensional microarray data , 2009, Comput. Stat. Data Anal..

[58]  Peter Filzmoser,et al.  CLASSIFICATION EFFICIENCIES FOR ROBUST LINEAR DISCRIMINANT ANALYSIS , 2008 .

[59]  P. L. Davies,et al.  Asymptotic behaviour of S-estimates of multivariate location parameters and dispersion matrices , 1987 .

[60]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[61]  Jae Won Lee,et al.  An extensive comparison of recent classification tools applied to microarray data , 2004, Comput. Stat. Data Anal..

[62]  Guoying Li,et al.  Projection-Pursuit Approach to Robust Dispersion Matrices and Principal Components: Primary Theory and Monte Carlo , 1985 .