Robust kernel principal component analysis and classification

Kernel principal component analysis (KPCA) extends linear PCA from a real vector space to any high dimensional kernel feature space. The sensitivity of linear PCA to outliers is well-known and various robust alternatives have been proposed in the literature. For KPCA such robust versions received considerably less attention. In this article we present kernel versions of three robust PCA algorithms: spherical PCA, projection pursuit and ROBPCA. These robust KPCA algorithms are analyzed in a classification context applying discriminant analysis on the KPCA scores. The performances of the different robust KPCA algorithms are studied in a simulation study comparing misclassification percentages, both on clean and contaminated data. An outlier map is constructed to visualize outliers in such classification problems. A real life example from protein classification illustrates the usefulness of robust KPCA and its corresponding outlier map.

[1]  Congde Lu,et al.  A robust kernel PCA algorithm , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).

[2]  Mia Hubert,et al.  LIBRA: a MATLAB library for robust analysis , 2005 .

[3]  D. G. Simpson,et al.  Robust principal component analysis for functional data , 2007 .

[4]  Michiel Debruyne,et al.  An outlier map for Support Vector Machine classification , 2010 .

[5]  P. Rousseeuw,et al.  Alternatives to the Median Absolute Deviation , 1993 .

[6]  Vincenzo Verardi Robust principal component analysis in Stata , 2009 .

[7]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[8]  Minh Hoai Nguyen 1-1-2008 Robust Kernel Principal Component Analysis , 2012 .

[9]  Tatsuya Akutsu,et al.  Protein homology detection using string alignment kernels , 2004, Bioinform..

[10]  Guoying Li,et al.  Projection-Pursuit Approach to Robust Dispersion Matrices and Principal Components: Primary Theory and Monte Carlo , 1985 .

[11]  Johan A. K. Suykens,et al.  Kernel Component Analysis Using an Epsilon-Insensitive Robust Loss Function , 2008, IEEE Transactions on Neural Networks.

[12]  D. Donoho,et al.  Breakdown Properties of Location Estimates Based on Halfspace Depth and Projected Outlyingness , 1992 .

[13]  Hengjian Cui,et al.  Asymptotic distributions of principal components based on robust dispersions , 2003 .

[14]  Jian Yang,et al.  Essence of kernel Fisher discriminant: KPCA plus LDA , 2004, Pattern Recognit..

[15]  Mia Hubert,et al.  ROBPCA: A New Approach to Robust Principal Component Analysis , 2005, Technometrics.

[16]  Mia Hubert,et al.  Robust PCA and classification in biosciences , 2004, Bioinform..

[17]  Robin Sibson,et al.  What is projection pursuit , 1987 .

[18]  P. Rousseeuw,et al.  A fast algorithm for the minimum covariance determinant estimator , 1999 .

[19]  W. Stahel Robuste Schätzungen: infinitesimale Optimalität und Schätzungen von Kovarianzmatrizen , 1981 .

[20]  P. Filzmoser,et al.  Algorithms for Projection-Pursuit Robust Principal Component Analysis , 2007 .

[21]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[22]  Jian Yang,et al.  KPCA plus LDA: a complete kernel Fisher discriminant framework for feature extraction and recognition , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Dechang Chen,et al.  Gene Expression Data Classification With Kernel Principal Component Analysis , 2005, Journal of biomedicine & biotechnology.

[24]  M. Hubert,et al.  A fast method for robust principal components with applications to chemometrics , 2002 .

[25]  Mia Hubert,et al.  Fast and robust discriminant analysis , 2004, Comput. Stat. Data Anal..

[26]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[27]  Qianqiu Li,et al.  Taxonomic utility of a phylogenetic analysis of phosphoglycerate kinase proteins of Archaea, Bacteria, and Eukaryota: insights by Bayesian analyses. , 2005, Molecular phylogenetics and evolution.

[28]  Sven Serneels,et al.  Robustified least squares support vector classification , 2009 .

[29]  Mia Hubert,et al.  The influence function of the Stahel–Donoho covariance estimator of smallest outlyingness , 2009 .

[30]  Christophe Croux,et al.  A Fast Algorithm for Robust Principal Components Based on Projection Pursuit , 1996 .

[31]  Mia Hubert,et al.  Detecting influential observations in Kernel PCA , 2010, Comput. Stat. Data Anal..

[32]  Ruben H. Zamar,et al.  Robust Estimates of Location and Dispersion for High-Dimensional Datasets , 2002, Technometrics.

[33]  Johan Suykens Least Squares Support Vector Machines : an Overview , 2002 .

[34]  Ricardo A. Maronna,et al.  Principal Components and Orthogonal Regression Based on Robust Scales , 2005, Technometrics.

[35]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[36]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machines , 2002 .

[37]  Christophe Croux,et al.  High breakdown estimators for principal components: the projection-pursuit approach revisited , 2005 .

[38]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machines , 2002 .

[39]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[40]  Takio Kurita,et al.  Robust De-noising by Kernel PCA , 2002, ICANN.

[41]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[42]  J. Marden Some robust estimates of principal components , 1999 .