Local subspace-based outlier detection using global neighbourhoods

Outlier detection in high-dimensional data is a challenging yet important task, as it has applications in, e.g., fraud detection and quality control. State-of-the-art density-based algorithms perform well because they 1) take the local neighbourhoods of data points into account and 2) consider feature subspaces. In highly complex and high-dimensional data, however, existing methods are likely to overlook important outliers because they do not explicitly take into account that the data is often a mixture distribution of multiple components. We therefore introduce GLOSS, an algorithm that performs local subspace outlier detection using global neighbourhoods. Experiments on synthetic data demonstrate that GLOSS more accurately detects local outliers in mixed data than its competitors. Moreover, experiments on real-world data show that our approach identifies relevant outliers overlooked by existing methods, confirming that one should keep an eye on the global perspective even when doing local outlier detection.

[1]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[2]  Hans-Peter Kriegel,et al.  Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection , 2012, Data Mining and Knowledge Discovery.

[3]  Hans-Peter Kriegel,et al.  LoOP: local outlier probabilities , 2009, CIKM.

[4]  Aleksandar Lazarevic,et al.  Incremental Local Outlier Detection for Data Streams , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[5]  Carlos Soares,et al.  Outlier Detection using Clustering Methods: a data cleaning application , 2004 .

[6]  Hans-Peter Kriegel,et al.  Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data , 2009, PAKDD.

[7]  Pasi Fränti,et al.  Outlier Detection Using k-Nearest Neighbour Graph , 2004, ICPR.

[8]  Josef Meinhardt,et al.  Stamping Plant 4.0 – Basics for the Application of Data Mining Methods in Manufacturing Car Body Parts , 2015 .

[9]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[10]  Arnold P. Boedihardjo,et al.  GLS-SOD: a generalized local statistical approach for spatial outlier detection , 2010, KDD '10.

[11]  Dr. Shuchita Upadhyaya,et al.  Classification Based Outlier Detection Techniques , 2012 .

[12]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[13]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[14]  Klemens Böhm,et al.  HiCS: High Contrast Subspaces for Density-Based Outlier Ranking , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[15]  W. R. Buckland,et al.  Outliers in Statistical Data , 1979 .

[16]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[17]  長 国強 Outlier Detection for Stationary Time Series , 1998 .

[18]  Hans-Peter Kriegel,et al.  Outlier Detection in Arbitrarily Oriented Subspaces , 2012, 2012 IEEE 12th International Conference on Data Mining.

[19]  Vldb Endowment,et al.  The VLDB journal : the international journal on very large data bases. , 1992 .

[20]  Chang-Tien Lu,et al.  Spatial outlier detection: random walk based approaches , 2010, GIS '10.

[21]  Arthur Zimek,et al.  Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection , 2015, ACM Trans. Knowl. Discov. Data.

[22]  Klemens Böhm,et al.  Outlier Ranking via Subspace Analysis in Multiple Views of the Data , 2012, 2012 IEEE 12th International Conference on Data Mining.