Clustering techniques for streaming data-a survey

Nowadays many applications are generating streaming data for an example real-time surveillance, internet traffic, sensor data, health monitoring systems, communication networks, online transactions in the financial market and so on. Data Streams are temporally ordered, fast changing, massive, and potentially infinite sequence of data. Data Stream mining is a very challenging problem. This is due to the fact that data streams are of tremendous volume and flows at very high speed which makes it impossible to store and scan streaming data multiple time. Concept evolution in streaming data further magnifies the challenge of working with streaming data. Clustering is a data stream mining task which is very useful to gain insight of data and data characteristics. Clustering is also used as a pre-processing step in over all mining process for an example clustering is used for outlier detection and for building classification model. In this paper we will focus on the challenges and necessary features of data stream clustering techniques, review and compare the literature for data stream clustering by example and variable, describe some real world applications of data stream clustering, and tools for data stream clustering.

[1]  Li Tu,et al.  Density-based clustering for real-time stream data , 2007, KDD '07.

[2]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[3]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[4]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[5]  Chen Jia,et al.  A Grid and Density-Based Clustering Algorithm for Processing Data Stream , 2008, 2008 Second International Conference on Genetic and Evolutionary Computing.

[6]  Eyke Hüllermeier,et al.  Online clustering of parallel data streams , 2006, Data Knowl. Eng..

[7]  Jiawei Han,et al.  MAIDS: mining alarming incidents from data streams , 2004, SIGMOD '04.

[8]  Thanapat Kangkachit,et al.  HUE-Stream: Evolution-Based Clustering Technique for Heterogeneous Data Streams with Uncertainty , 2011, ADMA.

[9]  Jing Gao,et al.  An Incremental Data Stream Clustering Algorithm Based on Dense Units Detection , 2005, PAKDD.

[10]  Sudipto Guha,et al.  Streaming-data algorithms for high-quality clustering , 2002, Proceedings 18th International Conference on Data Engineering.

[11]  João Gama,et al.  Hierarchical Clustering of Time-Series Data Streams , 2008, IEEE Transactions on Knowledge and Data Engineering.

[12]  Durga Toshniwal,et al.  A Framework for Outlier Detection in Evolving Data Streams by Weighting Attributes in Clustering , 2012 .

[13]  Hai Huang,et al.  rDenStream, A Clustering Algorithm over an Evolving Data Stream , 2009, 2009 International Conference on Information Engineering and Computer Science.

[14]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[15]  Zdzislaw Pawlak,et al.  Some Issues on Rough Sets , 2004, Trans. Rough Sets.

[16]  Kitsana Waiyamai,et al.  E-Stream: Evolution-Based Technique for Stream Clustering , 2007, ADMA.

[17]  Aryya Gangopadhyay,et al.  A method for clustering transient data streams , 2009, SAC '09.

[18]  Thomas Seidl,et al.  MOA: Massive Online Analysis, a Framework for Stream Classification and Clustering , 2010, WAPA.

[19]  Ling Chen,et al.  A clustering algorithm for multiple data streams based on spectral component similarity , 2012, Inf. Sci..

[20]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[21]  Ming-Syan Chen,et al.  Clustering over Multiple Evolving Streams by Events and Correlations , 2007, IEEE Transactions on Knowledge and Data Engineering.

[22]  Philip S. Yu,et al.  A Framework for Projected Clustering of High Dimensional Data Streams , 2004, VLDB.

[23]  Durga Toshniwal,et al.  A Novel Rough Set Based Clustering Approach for Streaming Data , 2012, SocProS.