TECHNOLOGY Comparative Analysis of Clustering Algorithms for Outlier Detection in Data Streams

Nowadays, data mining has become one of the most popular research areas in the field of computer scien ce, because data mining techniques are used for extract ing the hidden knowledge from the large databases. In data mining, most of the work is emphasized over knowledge discovery and data stream mining is becoming an active research area in this domain. A data stream is a si milar to river, it means continuous and massive seq uence of data elements are in and out generated at a rapid rate a nd the analysis of data stream has been recently at tracted attention over in data mining research community. When the amount of data is very huge, it leads to a numerous computational and mining challenges due to shortage of hardware and software limitations. Data mining techniques are newly proposed for data streams they are highly helpful to mine are data stream clustering, data s tream classification, frequent pattern technique, sliding window techniques and so on. For outlier detection data stream clustering algorithm is highly needed. This main ob jective of this research work is to perform the clu stering process in data streams and detecting the outliers in data streams. In this research work, two clustering algo rithms namely BIRCH with CLARANS and CURE with CLARANS are used for finding the outliers in data streams. Different types, sizes of data sets and two performance facto rs such as clustering accuracy and outlier detectio n accuracy are used for analysis. By analyzing the experimental re sults, it is observed that the CURE with CLARANS clustering algorithm performance is more accurate than the BIRCH with CLARANS.

[1]  G. S. David Sam Jayakumar,et al.  A New Procedure of Clustering Based on Multivariate Outlier Detection , 2012, Journal of Data Science.

[2]  Charu C. Aggarwal,et al.  Data Streams - Models and Algorithms , 2014, Advances in Database Systems.

[3]  A. Madansky Identification of Outliers , 1988 .

[4]  T. Soni Madhulatha Overview of streaming-data algorithms , 2012, ArXiv.

[5]  N. Sambasiva Rao,et al.  Detection of Outliers and Change points in a Data Stream of Bio Informatics Data , 2012 .

[6]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[7]  Rayid Ghani,et al.  Proceedings of the 2010 conference on Data Mining for Business Applications , 2010 .

[8]  Madjid Khalilian,et al.  Data Stream Clustering: Challenges and Issues , 2010, ArXiv.

[9]  Daniel Barbará,et al.  Requirements for clustering data streams , 2002, SKDD.

[10]  Paul J. Fortier,et al.  Hierarchical Agglomerative Clustering Based T-outlier Detection , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[11]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[12]  Philip S. Yu,et al.  A Framework for Projected Clustering of High Dimensional Data Streams , 2004, VLDB.

[13]  W. R. Buckland,et al.  Outliers in Statistical Data , 1979 .

[14]  Durga Toshniwal,et al.  Unsupervised outlier detection in streaming data using weighted clustering , 2012, 2012 12th International Conference on Intelligent Systems Design and Applications (ISDA).

[15]  Sukumar Nandi,et al.  An Outlier Detection Method Based on Clustering , 2011, 2011 Second International Conference on Emerging Applications of Information Technology.

[16]  Dhinaharan Nagamalai,et al.  Advanced Computing , 2011 .

[17]  Luís Torgo,et al.  Resource-bounded Outlier Detection using Clustering Methods , 2010, Data Mining for Business Applications.