Applying Anomaly Pattern Score for Outlier Detection

Outlier detection is an important sub-field of data mining and studied intensively by researchers in the past decades. For neighborhood-based outlier detection methods like KNN and LOF, different settings in the number of neighbors (indicated by a parameter <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula>) would greatly affect the model’s performance. Thereby, there are some recent studies which focus on identifying the optimal value of <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula> by analyzing the global or local structure of the dataset. But, we argue that neighborhood-based outlier detection model could obtain an improvement in performance without parameter tuning. In this paper, from a novel angle of view, we adopt a uniform sampling strategy to generate a series of local proximity graphs and propose a new adaptive outlier detection model named anomaly pattern score which does not rely on the <inline-formula> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula> tuning. In addition, the theoretical analysis of the effectiveness of the proposed model is conducted as well. The extensive experiments on both synthetic and real-world datasets show that the proposed model outperforms the state-of-the-art algorithms on most datasets.

[1]  Jong-Seok Lee,et al.  Robust outlier detection using the instability factor , 2014, Knowl. Based Syst..

[2]  Pang-Ning Tan,et al.  Outrank: a Graph-Based Outlier Detection Framework Using Random Walk , 2008, Int. J. Artif. Intell. Tools.

[3]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[4]  Haibo He,et al.  A local density-based approach for outlier detection , 2017, Neurocomputing.

[5]  Hongbin Zha,et al.  Learning to Detect Anomalies in Surveillance Video , 2015, IEEE Signal Processing Letters.

[6]  Ji Feng,et al.  A non-parameter outlier detection algorithm based on Natural Neighbor , 2016, Knowl. Based Syst..

[7]  Michael J. V. Leach,et al.  Contextual anomaly detection in crowded surveillance scenes , 2014, Pattern Recognit. Lett..

[8]  Steve Harenberg,et al.  Anomaly detection in dynamic networks: a survey , 2015 .

[9]  Leman Akoglu,et al.  Collective Opinion Spam Detection: Bridging Review Networks and Metadata , 2015, KDD.

[10]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[11]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[12]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[13]  Ender Konukoglu,et al.  Unsupervised Detection of Lesions in Brain MRI using constrained adversarial auto-encoders , 2018, ArXiv.

[14]  Raymond T. Ng,et al.  Finding Intensional Knowledge of Distance-Based Outliers , 1999, VLDB.

[15]  Tomás Pevný,et al.  Loda: Lightweight on-line detector of anomalies , 2016, Machine Learning.

[16]  Pang-Ning Tan,et al.  Outlier Detection Using Random Walks , 2006, 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06).

[17]  Anthony K. H. Tung,et al.  Ranking Outliers Using Symmetric Neighborhood Relationship , 2006, PAKDD.

[18]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[19]  Leman Akoglu,et al.  Collective Opinion Spam Detection using Active Inference , 2016, SDM.

[20]  Ji Feng,et al.  Weighted natural neighborhood graph: an adaptive structure for clustering and outlier detection with no neighborhood parameter , 2016, Cluster Computing.

[21]  Chuan Zhou,et al.  Parameter k search strategy in outlier detection , 2018, Pattern Recognit. Lett..

[22]  Hans-Peter Kriegel,et al.  Angle-based outlier detection in high-dimensional data , 2008, KDD.

[23]  Ian Davidson,et al.  Discovering Contexts and Contextual Outliers Using Random Walks in Graphs , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[24]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[25]  Theodore Johnson,et al.  Fast Computation of 2-Dimensional Depth Contours , 1998, KDD.

[26]  Arjun Mukherjee,et al.  Analyzing and Detecting Opinion Spam on a Large-scale Dataset via Temporal and Spatial Patterns , 2015, ICWSM.

[27]  Daniel Rueckert,et al.  Brain lesion segmentation through image synthesis and outlier detection , 2017, NeuroImage: Clinical.

[28]  Jian Tang,et al.  Enhancing Effectiveness of Outlier Detections for Low Density Patterns , 2002, PAKDD.

[29]  Zhen Liu,et al.  A New Outlier Detection Model Using Random Walk on Local Information Graph , 2018, IEEE Access.

[30]  Ji Feng,et al.  Natural neighbor: A self-adaptive neighborhood method without parameter K , 2016, Pattern Recognit. Lett..