Outlier Detection in the Framework of Dimensionality Reduction

We propose an effective outlier detection algorithm for high-dimensional data. We consider manifold models of data as is typically assumed in dimensionality reduction/manifold learning. Namely, we consider a noisy data set sampled from a low-dimensional manifold in a high-dimensional data space. Our algorithm uses local geometric structure to determine inliers, from which the outliers are identified. The algorithm is applicable to both linear and nonlinear models of data. We also discuss various implementation issues and we present several examples to demonstrate the effectiveness of the new approach.

[1]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[2]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  D. Donoho,et al.  Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Klemens Böhm,et al.  Outlier Ranking via Subspace Analysis in Multiple Views of the Data , 2012, 2012 IEEE 12th International Conference on Data Mining.

[5]  Haifeng Chen,et al.  Monitoring High-Dimensional Data for Failure Detection and Localization in Large-Scale Computing Systems , 2008, IEEE Transactions on Knowledge and Data Engineering.

[6]  Guoying Li,et al.  Projection-Pursuit Approach to Robust Dispersion Matrices and Principal Components: Primary Theory and Monte Carlo , 1985 .

[7]  Pablo A. Parrilo,et al.  Rank-Sparsity Incoherence for Matrix Decomposition , 2009, SIAM J. Optim..

[8]  Vivekanand Gopalkrishnan,et al.  Mining Outliers with Ensemble of Heterogeneous Detectors on Random Subspaces , 2010, DASFAA.

[9]  Emmanuel Müller,et al.  Statistical selection of relevant subspace projections for outlier ranking , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[10]  C. Croux,et al.  Principal Component Analysis Based on Robust Estimators of the Covariance or Correlation Matrix: Influence Functions and Efficiencies , 2000 .

[11]  Thomas Vetter,et al.  A morphable model for the synthesis of 3D faces , 1999, SIGGRAPH.

[12]  Klemens Böhm,et al.  HiCS: High Contrast Subspaces for Density-Based Outlier Ranking , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[13]  张振跃,et al.  Principal Manifolds and Nonlinear Dimensionality Reduction via Tangent Space Alignment , 2004 .

[14]  Vipin Kumar,et al.  Feature bagging for outlier detection , 2005, KDD '05.

[15]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[16]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[17]  Hao Huang,et al.  Local anomaly descriptor: a robust unsupervised algorithm for anomaly detection based on diffusion space , 2012, CIKM.

[18]  Dit-Yan Yeung,et al.  Robust locally linear embedding , 2006, Pattern Recognit..

[19]  Constantine Caramanis,et al.  Robust PCA via Outlier Pursuit , 2010, IEEE Transactions on Information Theory.

[20]  Haifeng Chen,et al.  Robust Nonlinear Dimensionality Reduction for Manifold Learning , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[21]  Alan L. Yuille,et al.  Robust principal component analysis by self-organizing rules based on statistical physics approach , 1995, IEEE Trans. Neural Networks.

[22]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[23]  Charu C. Aggarwal,et al.  Outlier ensembles: position paper , 2013, SKDD.

[24]  Kwang-Ho Ro,et al.  Outlier detection for high-dimensional data , 2015 .

[25]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[26]  Miguel Á. Carreira-Perpiñán,et al.  Manifold blurring mean shift algorithms for manifold denoising , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[28]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[29]  Patrick Valduriez,et al.  Proceedings of the 2004 ACM SIGMOD international conference on Management of data , 2004, SIGMOD 2004.

[30]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[31]  Hongyuan Zha,et al.  Analysis of an alignment algorithm for nonlinear dimensionality reduction , 2007 .

[32]  F. Hampel The Influence Curve and Its Role in Robust Estimation , 1974 .

[33]  Charu Agarwal,et al.  Outlier ensembles , 2013, ODD '13.

[34]  Christophe Croux,et al.  High breakdown estimators for principal components: the projection-pursuit approach revisited , 2005 .