Comparative Study of Outlier Detection Algorithms

As the dimension of the data is increasing day by day, outlier detection is emerging as one of the active area of research. Finding of the outliers from large data sets is the main problem. Outlier is considered as the pattern that is different from the rest of the patterns present in the data set. The detection of the outlier in the data set is an important process as it helps in acquiring the useful information that further helps in the data analysis. Various algorithms have been proposed till date for the detection of the outliers. This paper covers a study of various outlier detection algorithms like Statistical based outlier detection, Depth based outlier detection, Clustering based technique, Density based outlier detection etc. Comparison study of these outlier detection methods is done to find out which of the outlier detection algorithms are more applicable on high dimensional data.

[1]  Miguel Cárdenas-Montes,et al.  Depth-Based Outlier Detection Algorithm , 2014, HAIS.

[2]  A. Fahad,et al.  Intelligent Integration of Discharge Summary: A Formative Model , 2013, 2013 4th International Conference on Intelligent Systems, Modelling and Simulation.

[3]  M. P. S. Bhatia,et al.  A Cluster-based Approach for Outlier Detection in Dynamic Data Streams (KORM: k-median OutlieR Miner) , 2010, ArXiv.

[4]  Bin Wang,et al.  Distance-Based Outlier Detection on Uncertain Data , 2009, 2009 Ninth IEEE International Conference on Computer and Information Technology.

[5]  Bonny Banerjee,et al.  RODS: Rarity based Outlier Detection in a Sparse Coding Framework , 2016, IEEE Transactions on Knowledge and Data Engineering.

[6]  Kanishka Bhaduri,et al.  Algorithms for speeding up distance-based outlier detection , 2011, KDD.

[7]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[8]  H. S. Behera,et al.  A New Hybridized K-Means Clustering Based Outlier Detection Technique For Effective Data Mining , 2012 .

[9]  P. Rousseeuw,et al.  High-dimensional computation of the deepest location , 2000 .

[10]  Vijay Kumar,et al.  Outlier Detection: A Clustering-Based Approach , 2013 .

[11]  Salvatore J. Stolfo,et al.  Using artificial anomalies to detect unknown and known network intrusions , 2003, Knowledge and Information Systems.

[12]  S S Sreevidya,et al.  A Survey on Outlier Detection Methods , 2014 .

[13]  Hongan Wang,et al.  DB-Outlier Detection Algorithm Using Divide and Conquer Approach over Dynamic DataStream , 2008, 2008 International Conference on Computer Science and Software Engineering.

[14]  S. S. Dhande Outlier Detection over Data Set Using Cluster-Based and Distance-Based Approach , 2012 .

[15]  Yong Shi,et al.  COID: A cluster–outlier iterative detection approach to multi-dimensional data analysis , 2011, Knowledge and Information Systems.

[16]  P. Devale,et al.  A Survey on Outlier Detection Methods , 2015 .

[17]  Publisher Am Publications INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN ADVANCED ENGINEERING , 2014 .

[18]  Shuchita Upadhyaya,et al.  Outlier Detection: Applications And Techniques , 2012 .

[19]  Jing Lin,et al.  An angle-based subspace anomaly detection approach to high-dimensional data: With an application to industrial fault detection , 2015, Reliab. Eng. Syst. Saf..

[20]  Dong Xiang,et al.  Information-theoretic measures for anomaly detection , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[21]  Shengrui Wang,et al.  Information-Theoretic Outlier Detection for Large-Scale Categorical Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[22]  Claudio Sartori,et al.  Distributed Strategies for Mining Outliers in Large Data Sets , 2013, IEEE Transactions on Knowledge and Data Engineering.

[23]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data: A Survey , 2014, IEEE Transactions on Knowledge and Data Engineering.

[24]  G. Meera Gandhi,et al.  Cluster Based Outlier Detection Algorithm for Healthcare Data , 2015 .

[25]  R. Devi,et al.  An Efficient Unsupervised Cluster based Hubness Technique For Outlier Detection in High dimensional data , 2015 .

[26]  Dr. T. Christopher A Study of Clustering Based Algorithm for Outlier Detection in Data streams , 2015 .

[27]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[28]  Vasant Honavar,et al.  Intelligent agents for intrusion detection , 1998, 1998 IEEE Information Technology Conference, Information Environment for the Future (Cat. No.98EX228).

[29]  Fabrizio Angiulli,et al.  Detecting distance-based outliers in streams of data , 2007, CIKM '07.

[30]  Ha Nguyen Thi Thu,et al.  A supervised learning method combine with dimensionality reduction in Vietnamese text summarization , 2013, 2013 Computing, Communications and IT Applications Conference (ComComAp).

[31]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[32]  Philip S. Yu,et al.  An Efficient Approach for Outlier Detection with Imperfect Data Labels , 2014, IEEE Transactions on Knowledge and Data Engineering.

[33]  Sukumar Nandi,et al.  An Outlier Detection Method Based on Clustering , 2011, 2011 Second International Conference on Emerging Applications of Information Technology.

[34]  Anjali Barmade An Efficient Strategy to Detect Outlier Transactions , 2014 .

[35]  Ji Zhang,et al.  Advancements of Outlier Detection: A Survey , 2013, EAI Endorsed Trans. Scalable Inf. Syst..

[36]  Karsten M. Borgwardt,et al.  Rapid Distance-Based Outlier Detection via Sampling , 2013, NIPS.

[37]  Jiadong Ren,et al.  Efficient Outlier Detection Algorithm for Heterogeneous Data Streams , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[38]  Prashant Chatur,et al.  Outlier Detection Techniques over Streaming Data in Data Mining: A Research Perspective , 2013 .