Improved outlier detection using sparse coding-based methods

Abstract Outlier detection is an active area of research in data mining and a large number of algorithms exist. Our goal is to come up with a guideline on how to choose the most appropriate outlier detection algorithm for a given dataset without exploiting any domain- or application-specific information. Extensive experimentations with a number of state-of-the-art algorithms on thousands of benchmark datasets revealed a clear trend. For datasets with low dimensionality and low difficulty level, traditional methods outperform sparse coding-based outlier detection (SCOD) algorithms. But the trend reverses as the dimensionality or difficulty level increases. A threshold emerges as the point of intersection of the trends for SCOD and traditional algorithms, which is 250 and 21 for dimensionality and difficulty level respectively.

[1]  Bonny Banerjee,et al.  RODS: Rarity based Outlier Detection in a Sparse Coding Framework , 2016, IEEE Transactions on Knowledge and Data Engineering.

[2]  Zhenni Li,et al.  A Fast Algorithm for Learning Overcomplete Dictionary for Sparse Representation Based on Proximal Operators , 2015, Neural Computation.

[3]  Mia Hubert,et al.  Robust statistics for outlier detection , 2011, WIREs Data Mining Knowl. Discov..

[4]  Thomas G. Dietterich,et al.  Systematic construction of anomaly detection benchmarks from real data , 2013, ODD '13.

[5]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[6]  Fei-Fei Li,et al.  Online detection of unusual events in videos via dynamic sparse coding , 2011, CVPR 2011.

[7]  Clayton D. Scott,et al.  Robust kernel density estimation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[9]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[10]  Hans-Peter Kriegel,et al.  Angle-based outlier detection in high-dimensional data , 2008, KDD.

[11]  Su Yang,et al.  LDBOD: A novel local distribution based outlier detector , 2008, Pattern Recognit. Lett..

[12]  Nur Evin Özdemirel,et al.  An adaptive neighbourhood construction algorithm based on density and connectivity , 2015, Pattern Recognit. Lett..

[13]  Hans-Peter Kriegel,et al.  A survey on unsupervised outlier detection in high‐dimensional numerical data , 2012, Stat. Anal. Data Min..

[14]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[15]  Michael Elad,et al.  Sparse Coding with Anomaly Detection , 2013, Journal of Signal Processing Systems.

[16]  Peter Filzmoser,et al.  Outlier identification in high dimensions , 2008, Comput. Stat. Data Anal..

[17]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[18]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[19]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[20]  Michael Elad,et al.  Efficient Implementation of the K-SVD Algorithm using Batch Orthogonal Matching Pursuit , 2008 .

[21]  Aristides Gionis,et al.  k-means-: A Unified Approach to Clustering and Outlier Detection , 2013, SDM.

[22]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[23]  Liqing Zhang,et al.  Dynamic visual attention: searching for coding length increments , 2008, NIPS.

[24]  Bonny Banerjee,et al.  Online Detection of Abnormal Events Using Incremental Coding Length , 2015, AAAI.

[25]  Ali Borji,et al.  Exploiting local and global patch rarities for saliency detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Bonny Banerjee,et al.  SELP: A general-purpose framework for learning the norms from saliencies in spatiotemporal data , 2014, Neurocomputing.

[27]  Zengyou He,et al.  Discovering cluster-based local outliers , 2003, Pattern Recognit. Lett..

[28]  Guojun Gan,et al.  K-means Clustering with Outlier Removal , 2017, Pattern Recognit. Lett..