A Novel Algorithm for Adaptive Data Stream Clustering

In recent years, processing and management of data streams has become a topic of active research in several fields of computer science. A data stream is continuously increasing sequence of time stamped data. There are various applications in which data streams are produced such as network monitoring, telecommunication systems, stock markets, customer click streams or any type of multi-sensor system. Due to large number of data stream applications, its clustering has become an important technique in data mining and knowledge discovery. STREAM is a data stream clustering algorithm which divides data into chunks, cluster the chunks and, then, again cluster the obtained centers. An important constraint of STREAM is inadaptability with evolving data stream. Particularly it is not sensitive to evolution of the underlying data stream. In many cases, the patterns in the underlying stream may evolve and change significantly. Therefore, it is critical for the clustering process to be adaptable with such changes and provide insights over different time horizons. In this paper we have proposed an improved STREAM clustering method which retains the STREAM algorithm adaptive to drifts by adjusting itself, as the data stream changes.

[1]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[2]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Data stream clustering: A survey , 2013, CSUR.

[3]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[4]  Eyke Hüllermeier,et al.  Online clustering of parallel data streams , 2006, Data Knowl. Eng..

[5]  Charles Elkan,et al.  Scalability for clustering algorithms revisited , 2000, SKDD.

[6]  Rahim Tafazolli,et al.  Adaptive Clustering for Dynamic IoT Data Streams , 2017, IEEE Internet of Things Journal.

[7]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[8]  Charu C. Aggarwal,et al.  Data Clustering: Algorithms and Applications , 2014 .

[9]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[10]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[11]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[12]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[13]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[14]  Christian Sohler,et al.  StreamKM++: A clustering algorithm for data streams , 2010, JEAL.

[15]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[16]  Patrick Wendell,et al.  Learning Spark: Lightning-Fast Big Data Analytics , 2015 .

[17]  Tian Zhang,et al.  BIRCH: A New Data Clustering Algorithm and Its Applications , 1997, Data Mining and Knowledge Discovery.

[18]  Nong Ye,et al.  A scalable, incremental learning algorithm for classification problems , 2002 .

[19]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[20]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Efficiency issues of evolutionary k-means , 2011, Appl. Soft Comput..

[21]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.