ODRA: an outlier detection algorithm based on relevant attribute analysis method

Advances in data acquisition have generated an enormous amount of data that captures business, commercial, technological and scientific information. However, some occurrences are rare or unusual, irrespective of a large amount of data available. These rare occurrences in data mining are usually referred to as outliers or anomalies. All these rare occurrences are infrequent. Sometimes it varies from 0.01% to 10% depending on the type of application. In recent years, outlier detection has become important in many applications and has attracted considerable attention among the increasing number of data mining techniques. Focusing on this has resulted in several outlier detection algorithms, mostly based on distance or density. However, each method has its inherent weaknesses. Methods based on distance have problems with local density, and methods based on density have problems with low-density patterns. In this paper, we present a new outlier detection algorithm based on the relevant attribute analysis (ODRA) for local outlier detection in a high-dimensional dataset. There are two phases of the proposed algorithm. During the preliminary stage, we present a data reduction method that reduces the data set by pruning irrelevant attributes and data points. In the second phase, we propose an outlier detection method based on k -NN kernel density estimation. The experimental results on 15 UCI machine learning repository datasets show the supremacy and effectiveness of our proposed approach over state-of-the-art outlier detection methods.

[1]  Arthur Zimek,et al.  On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study , 2016, Data Mining and Knowledge Discovery.

[2]  Philip S. Yu,et al.  An effective and efficient algorithm for high-dimensional outlier detection , 2005, The VLDB Journal.

[3]  Shengrui Wang,et al.  Mining Projected Clusters in High-Dimensional Spaces , 2009, IEEE Transactions on Knowledge and Data Engineering.

[4]  Hans-Peter Kriegel,et al.  Outlier Detection in Arbitrarily Oriented Subspaces , 2012, 2012 IEEE 12th International Conference on Data Mining.

[5]  Emmanuel Müller,et al.  Statistical selection of relevant subspace projections for outlier ranking , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[6]  S YuPhilip,et al.  Outlier detection for high dimensional data , 2001 .

[7]  Anthony K. H. Tung,et al.  Ranking Outliers Using Symmetric Neighborhood Relationship , 2006, PAKDD.

[8]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[9]  Xiao Qin,et al.  LOMA: A local outlier mining algorithm based on attribute relevance analysis , 2017, Expert Syst. Appl..

[10]  Christos Faloutsos,et al.  Example-based robust outlier detection in high dimensional datasets , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[11]  Xiao Qin,et al.  An outlier mining algorithm based on constrained concept lattice , 2014, Int. J. Syst. Sci..

[12]  Zhangyu Cheng,et al.  Outlier detection using isolation forest and local outlier factor , 2019, RACS.

[13]  Rasmus Pagh,et al.  A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data , 2012, KDD.

[14]  Sulan Zhang,et al.  A concept lattice based outlier mining method in low-dimensional subspaces , 2009, Pattern Recognit. Lett..

[15]  Hans-Peter Kriegel,et al.  Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data , 2009, PAKDD.

[16]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[17]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[18]  Hans-Peter Kriegel,et al.  LoOP: local outlier probabilities , 2009, CIKM.

[19]  Jian Tang,et al.  Enhancing Effectiveness of Outlier Detections for Low Density Patterns , 2002, PAKDD.

[20]  Jiang Xie,et al.  A local-gravitation-based method for the detection of outliers and boundary points , 2020, Knowl. Based Syst..

[21]  Arthur Zimek,et al.  Outlier Detection Based on Low Density Models , 2018, 2018 IEEE International Conference on Data Mining Workshops (ICDMW).

[22]  Hans-Peter Kriegel,et al.  Angle-based outlier detection in high-dimensional data , 2008, KDD.

[23]  Yi Zhang,et al.  Average Precision , 2009, Encyclopedia of Database Systems.

[24]  Hans-Peter Kriegel,et al.  Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection , 2012, Data Mining and Knowledge Discovery.

[25]  Hans-Peter Kriegel,et al.  Generalized Outlier Detection with Flexible Kernel Density Estimates , 2014, SDM.

[26]  A. Madansky Identification of Outliers , 1988 .

[27]  Klemens Böhm,et al.  HiCS: High Contrast Subspaces for Density-Based Outlier Ranking , 2012, 2012 IEEE 28th International Conference on Data Engineering.