Tweet Cluster Analyzer: Partition and Join-based Micro-clustering for Twitter Data Stream

Data stream mining is the process of extracting knowledge from continuously generated data. Since data stream processing is not a trivial task, the streams have to be analyzed with proper stream mining techniques. In many large volume of data stream processing, stream clustering helps to find the valuable hidden information. Many works have concentrated on clustering the data streams using various methods, but mostly those approaches lack in some core tasks needed to improve the cluster accuracy and quick processing of data streams. To tackle the problem of improving cluster quality and reducing the time for data stream processing time in cluster generation, the partition-based DBStream clustering method is proposed. The result has been compared with various data stream clustering methods, and it is evident from the experiments that the purity of clusters improves 5% and the time taken is reduced by 10% than the average time taken by other methods for clustering the data streams.

[1]  Chang-Dong Wang,et al.  SVStream: A Support Vector-Based Algorithm for Clustering Data Streams , 2013, IEEE Transactions on Knowledge and Data Engineering.

[2]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[3]  Judy Qiu,et al.  Parallel Clustering of High-Dimensional Social Media Data Streams , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[4]  Madhu Shukla,et al.  A novel approach for clustering data streams using granularity technique , 2015, 2015 International Conference on Advances in Computer Engineering and Applications.

[5]  Lin Chen,et al.  Self-adaptive clustering data stream algorithm based on SSMC-tree , 2013, 2013 IEEE 4th International Conference on Software Engineering and Service Science.

[6]  Hongbo Zhu,et al.  Clustering of Evolving Data Stream with Multiple Adaptive Sliding Window , 2010, 2010 International Conference on Data Storage and Data Engineering.

[7]  Weiguo Liu,et al.  Clustering Algorithm for High Dimensional Data Stream over Sliding Windows , 2011, 2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications.

[8]  Daling Wang,et al.  CDS-Tree: an effective index for clustering arbitrary shapes in data streams , 2005, 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications (RIDE-SDMA'05).

[9]  Aoying Zhou,et al.  Distributed Data Stream Clustering: A Fast EM-based Approach , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[10]  Ying Wah Teh,et al.  A Multi Density-Based Clustering Algorithm for Data Stream with Noise , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[11]  Qian Zhou A recent-biased clustering algorithm of data stream , 2011, 2011 Second International Conference on Mechanic Automation and Control Engineering.

[12]  Ji-zhen Liu,et al.  Application of Compound Gaussian Mixture Model clustering in the data stream , 2010, 2010 International Conference on Computer Application and System Modeling (ICCASM 2010).

[13]  Ashish Sharma,et al.  Mining of data stream using “DDenStream” clustering algorithm , 2013, 2013 IEEE International Conference in MOOC, Innovation and Technology in Education (MITE).

[14]  Vahid Mokhtari,et al.  An ensemble learning approach for data stream clustering , 2013, 2013 21st Iranian Conference on Electrical Engineering (ICEE).

[15]  Michèle Sebag,et al.  Data Stream Clustering With Affinity Propagation , 2014, IEEE Transactions on Knowledge and Data Engineering.