Outlier detection based on approximation accuracy entropy

Recently, proximity-based outlier detection methods receive much attention. For any given object x, a proximity-based method usually measures the degree of outlierness of x through examining the nearest neighbor structure of x, where the size of nearest neighborhood should be predetermined by the users. However, it is difficult for users to determine the size of nearest neighborhood. To solve the above problem, in this paper, we present an approximation accuracy entropy-based outlier detection algorithm, called ODAAE, within the framework of rough sets. Approximation accuracy entropy is an extension of Shannon information entropy in rough sets. To quantify the degree of outlierness of any given object, we develop a measure called the AAE(approximation accuracy entropy)-based outlier factor. Experimental results on real-world data sets show that the proposed algorithm is effective for outlier detection.

[1]  Cungen Cao,et al.  A hybrid approach to outlier detection based on boundary region , 2011, Pattern Recognit. Lett..

[2]  Qinghua Hu,et al.  Feature selection based on maximal neighborhood discernibility , 2018, Int. J. Mach. Learn. Cybern..

[3]  Ke Zhang,et al.  A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data , 2009, PAKDD.

[4]  Steven J. Burian,et al.  Detection of Urban-Induced Rainfall Anomalies in a Major Coastal City , 2003 .

[5]  Graham J. Williams,et al.  On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms , 2000, KDD '00.

[6]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[7]  Feng Jiang,et al.  An Approximation Decision Entropy Based Decision Tree Algorithm and Its Application in Intrusion Detection , 2012, RSKT.

[8]  Sam Kwong,et al.  Incorporating Diversity and Informativeness in Multiple-Instance Active Learning , 2017, IEEE Transactions on Fuzzy Systems.

[9]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[10]  Bernhard Sick,et al.  Novelty detection with CANDIES: a holistic technique based on probabilistic models , 2018, Int. J. Mach. Learn. Cybern..

[11]  Theodore Johnson,et al.  Fast Computation of 2-Dimensional Depth Contours , 1998, KDD.

[12]  Qinghua Hu,et al.  A Fitting Model for Feature Selection With Fuzzy Rough Sets , 2017, IEEE Transactions on Fuzzy Systems.

[13]  Cungen Cao,et al.  An information entropy-based approach to outlier detection in rough sets , 2010, Expert Syst. Appl..

[14]  Witold Pedrycz,et al.  A Study on Relationship Between Generalization Abilities and Fuzziness of Base Classifiers in Ensemble Learning , 2015, IEEE Transactions on Fuzzy Systems.

[15]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[16]  Ramin Mehran,et al.  Abnormal crowd behavior detection using social force model , 2009, CVPR.

[17]  Nishchal K. Verma,et al.  Clustering based outlier detection in fuzzy SVM , 2014, 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[18]  Ivo Düntsch,et al.  Uncertainty Measures of Rough Set Prediction , 1998, Artif. Intell..

[19]  Degang Chen,et al.  Attribute Reduction for Heterogeneous Data Based on the Combination of Classical and Fuzzy Rough Set Models , 2014, IEEE Transactions on Fuzzy Systems.

[20]  Ran Wang,et al.  Noniterative Deep Learning: Incorporating Restricted Boltzmann Machine Into Multilayer Random Weight Neural Networks , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[21]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .

[22]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[23]  Vir V. Phoha,et al.  K-Means+ID3: A Novel Method for Supervised Anomaly Detection by Cascading K-Means Clustering and ID3 Decision Tree Learning Methods , 2007, IEEE Transactions on Knowledge and Data Engineering.

[24]  Ashkan Sami,et al.  Entropy-based outlier detection using semi-supervised approach with few positive examples , 2014, Pattern Recognit. Lett..

[25]  Osmar R. Zaïane,et al.  Knowledge and Information Systems Class Separation through Variance : A new Application of Outlier Detection , 2010 .

[26]  Lev V. Utkin,et al.  A framework for imprecise robust one-class classification models , 2014, Int. J. Mach. Learn. Cybern..

[27]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[28]  Shengrui Wang,et al.  Information-Theoretic Outlier Detection for Large-Scale Categorical Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[29]  Bianca Zadrozny,et al.  Outlier detection by active learning , 2006, KDD '06.

[30]  Yuhui Zheng,et al.  Image segmentation by generalized hierarchical fuzzy C-means algorithm , 2015, J. Intell. Fuzzy Syst..

[31]  Jiye Liang,et al.  A new measure of uncertainty based on knowledge granulation for rough sets , 2009, Inf. Sci..

[32]  Clara Pizzuti,et al.  Fast Outlier Detection in High Dimensional Spaces , 2002, PKDD.

[33]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[34]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[35]  Shian-Shyong Tseng,et al.  Two-phase clustering process for outliers detection , 2001, Pattern Recognit. Lett..

[36]  Qinghua Hu,et al.  Rank Entropy-Based Decision Trees for Monotonic Classification , 2012, IEEE Transactions on Knowledge and Data Engineering.

[37]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[38]  Hans-Peter Kriegel,et al.  LoOP: local outlier probabilities , 2009, CIKM.

[39]  Ming-Wen Shao,et al.  A unified information measure for general binary relations , 2017, Knowl. Based Syst..

[40]  Qinghua Hu,et al.  Multi-granularity distance metric learning via neighborhood granule margin maximization , 2014, Inf. Sci..

[41]  Feng Jiang,et al.  Outlier detection based on granular computing and rough set theory , 2014, Applied Intelligence.

[42]  Jiye Liang,et al.  A New Method for Measuring the Uncertainty in Incomplete Information Systems , 2009, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[43]  Bin Gu,et al.  A Robust Regularization Path Algorithm for $\nu $ -Support Vector Classification , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[44]  Rajeev Rastogi,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD 2000.

[45]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[46]  Ran Wang,et al.  Discovering the Relationship Between Generalization and Uncertainty by Incorporating Complexity of Classification , 2018, IEEE Transactions on Cybernetics.

[47]  Hans-Peter Kriegel,et al.  Angle-based outlier detection in high-dimensional data , 2008, KDD.

[48]  Ming-Wen Shao,et al.  Feature subset selection based on fuzzy neighborhood rough sets , 2016, Knowl. Based Syst..

[49]  Wang Guo,et al.  Decision Table Reduction based on Conditional Information Entropy , 2002 .

[50]  Fabrizio Angiulli,et al.  Exploiting domain knowledge to detect outliers , 2013, Data Mining and Knowledge Discovery.

[51]  Jianwu Dang,et al.  Multi-kernel SVM based depression recognition using social media data , 2019, Int. J. Mach. Learn. Cybern..

[52]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[53]  Hans-Peter Kriegel,et al.  Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data , 2009, PAKDD.

[54]  Guo Wenzhong,et al.  Feature Selection of the Intrusion Detection Data Based on Particle Swarm Optimization and Neighborhood Reduction , 2010 .

[55]  Zengyou He,et al.  An Optimization Model for Outlier Detection in Categorical Data , 2005, ICIC.

[56]  Robert P. W. Duin,et al.  Support Vector Data Description , 2004, Machine Learning.

[57]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[58]  Xu Zhang,et al.  A Quick Attribute Reduction Algorithm with Complexity of max(O(|C||U|),O(|C|~2|U/C|)) , 2006 .

[59]  Qinghua Hu,et al.  Feature Selection Based on Neighborhood Discrimination Index , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[60]  David J. Hand,et al.  Statistical fraud detection: A review , 2002 .

[61]  Eleazar Eskin,et al.  A GEOMETRIC FRAMEWORK FOR UNSUPERVISED ANOMALY DETECTION: DETECTING INTRUSIONS IN UNLABELED DATA , 2002 .

[62]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[63]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[64]  Anthony K. H. Tung,et al.  Ranking Outliers Using Symmetric Neighborhood Relationship , 2006, PAKDD.

[65]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[66]  Jong-Seok Lee,et al.  A precise ranking method for outlier detection , 2015, Inf. Sci..

[67]  Charu C. Aggarwal,et al.  Outlier Analysis , 2013, Springer New York.

[68]  Feng Jiang,et al.  Initialization of K-modes clustering using outlier detection techniques , 2016, Inf. Sci..

[69]  Jiye Liang,et al.  Information entropy, rough entropy and knowledge granulation in incomplete information systems , 2006, Int. J. Gen. Syst..

[70]  Jiye Liang,et al.  The Information Entropy, Rough Entropy And Knowledge Granulation In Rough Set Theory , 2004, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[71]  Miao Duo,et al.  A HEURISTIC ALGORITHM FOR REDUCTION OF KNOWLEDGE , 1999 .

[72]  Witold Pedrycz,et al.  Selecting Discrete and Continuous Features Based on Neighborhood Decision Error Minimization , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).