A STUDY ON DIFFERENT APPROACHES OF OUTLIER DETECTION IN DATA MINING

Data mining is a process of extracting knowledge from large databases. Knowledge is appreciated as ultimate power now a days and considered as very important factor for the success of any organization because it has impacted the role of people working in that organization. Outlier detection is an important task in data mining and it has got many real time applications. The majority of real-time data contains certain unwanted or unrelated values, generally termed as “outliers”. The segregation of outlier improves the quality of data, and thereby the accuracy rate is increased. The outlier is either individual or groups that depend upon the data and applications. Outlier occurs due to various reasons such as automatic faults, behavioral changes in the system, human error, irrelevant data and instrument faults. This paper presents an overview of outlier concepts, taxonomy, approaches and review of outlier detection algorithms and techniques.

[1]  Hongxing He,et al.  A comparative study of RNN for outlier detection in data mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[2]  Clara Pizzuti,et al.  Distance-based detection and prediction of outliers , 2006, IEEE Transactions on Knowledge and Data Engineering.

[3]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data: A Survey , 2014, IEEE Transactions on Knowledge and Data Engineering.

[4]  Yuh-Jye Lee,et al.  Anomaly Detection via Online Oversampling Principal Component Analysis , 2013, IEEE Transactions on Knowledge and Data Engineering.

[5]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[6]  Philip S. Yu,et al.  An Efficient Approach for Outlier Detection with Imperfect Data Labels , 2014, IEEE Transactions on Knowledge and Data Engineering.

[7]  Hui Xiong,et al.  Enhancing data analysis with noise removal , 2006, IEEE Transactions on Knowledge and Data Engineering.

[8]  G. S. David Sam Jayakumar,et al.  A New Procedure of Clustering Based on Multivariate Outlier Detection , 2012, Journal of Data Science.

[9]  Mohd. Noor Md. Sap,et al.  Outlier Detection Technique in Data Mining: A Research Perspective , 2005 .

[10]  Shuchita Upadhyaya,et al.  Outlier Detection: Applications And Techniques , 2012 .

[11]  M.M. Deris,et al.  A Comparative Study for Outlier Detection Techniques in Data Mining , 2006, 2006 IEEE Conference on Cybernetics and Intelligent Systems.

[12]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[13]  Marco Vannucci,et al.  Outlier Detection Methods for Industrial Applications , 2008, ICRA 2008.

[14]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[15]  S. S. Dhande Outlier Detection over Data Set Using Cluster-Based and Distance-Based Approach , 2012 .

[16]  Prashant Chatur,et al.  Outlier Detection Techniques over Streaming Data in Data Mining: A Research Perspective , 2013 .

[17]  R. B. Robinson,et al.  Identifying outliers in correlated water quality data , 2005 .

[18]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[19]  Robert K. Goodrich,et al.  An Algorithm for Classification and Outlier Detection of Time-Series Data , 2010 .

[20]  Yada Zhu,et al.  A Unified Framework for Outlier Detection in Trace Data Analysis , 2014, IEEE Transactions on Semiconductor Manufacturing.