A Survey On Outlier Detection Technique In Streaming Data Using Data Clustering Approach

Data mining is a highly researched area in the today’s world as data is crucial part of many application, due to which many researchers express their interest in this domain. As there arises a need to process large dataset which imposes different challenges for researchers. To have a data which is free from a noisy attributes , known as a filtered data , is of much important to gain accuracy in a result sets. For that , finding and eliminate the noisy objects has gained a much more importance. An object that does not follow the footprints of usual data object is called outliers. Outlier detection process is used in numerous applications like fraud detection, intrusion detection system, tracking environmental activities, healthcare diagnosis. Numbers of approaches are used in the process of detection of outlier. Most approaches focuses to use Cluster-based and Distance based approach (i.e. using KMeans algorithm and Euclidian distance) for outlier detection in data sets which help them to create a group of similar elements or cluster of data points. Clustering techniques are highly useful for grouping similar data items from data sets and after that by applying distance based calculations, detection of outlier is done, so they are called cluster-based outlier detection. KMeans and Euclidian distance are the most common and popular algorithm for clustering and outlier detection process due to its simplicity and efficiency. Different application areas of outlier detection are discussed in this paper.

[1]  Raghav M. Purankar A Survey paper on An Effective Analytical Approaches for Detecting Outlier in Continuous Time Variant Data Stream , 2016 .

[2]  Madhu Shukla,et al.  A review on outlier detection techniques on data stream by using different approaches of K-Means algorithm , 2015, 2015 International Conference on Advances in Computer Engineering and Applications.

[3]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data , 2014, Outlier Detection for Temporal Data.

[4]  A. Madansky Identification of Outliers , 1988 .

[5]  Dr. T. Christopher A Study of Clustering Based Algorithm for Outlier Detection in Data streams , 2015 .

[6]  Sukumar Nandi,et al.  An Outlier Detection Method Based on Clustering , 2011, 2011 Second International Conference on Emerging Applications of Information Technology.

[7]  M. Tech Student,et al.  Detection of Outliers in Data Stream Using Clustering Method , 2015 .

[8]  Fabrizio Angiulli,et al.  Detecting distance-based outliers in streams of data , 2007, CIKM '07.

[9]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[10]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[11]  Alok Agarwal,et al.  Outlier detection in streaming data a research perspective , 2014, 2014 International Conference on Parallel, Distributed and Grid Computing.

[12]  Manish Mahajan,et al.  Outlier Reduction using Hybrid Approach in Data Mining , 2015 .

[13]  S. S. Dhande Outlier Detection over Data Set Using Cluster-Based and Distance-Based Approach , 2012 .