Consensus Clustering of Tweet Networks via Semantic and Sentiment Similarity Estimation

Although Twitter has become an important source of information, the number of accessible tweets is too large for users to easily find their desired information. To overcome this difficulty, a method for tweet clustering is proposed in this paper. Inspired by the reports that network representation is useful for multimedia content analysis including clustering, a network-based approach is employed. Specifically, a consensus clustering method for tweet networks that represent relationships among the tweets’ semantics and sentiment are newly derived. The proposed method integrates multiple clustering results obtained by applying successful clustering methods to the tweet networks. By integrating complementary clustering results obtained based on semantic and sentiment features, the accurate clustering of tweets becomes feasible. The contribution of this work can be found in the utilization of the features, which differs from existing network-based consensus clustering methods that target only the network structure. Experimental results for a real-world Twitter dataset, which includes 65 553 tweets of 25 datasets, verify the effectiveness of the proposed method.

[1]  Bart Dhoedt,et al.  Semantics-driven Event Clustering in Twitter Feeds , 2015, #MSM.

[2]  Lotfi A. Zadeh,et al.  Analysis of Twitter hashtags: Fuzzy clustering approach , 2015, 2015 Annual Conference of the North American Fuzzy Information Processing Society (NAFIPS) held jointly with 2015 5th World Conference on Soft Computing (WConSC).

[3]  Wenchao Xiao,et al.  Semi-supervised hierarchical clustering ensemble and its application , 2016, Neurocomputing.

[4]  Rose V Pattani Efficient Density Based Clustering of Tweets and Sentimental Analysis Based on Segmentation , 2016 .

[5]  Johan Dahlin,et al.  Ensemble approaches for improving community detection methods , 2013, ArXiv.

[6]  Andreas Geyer-Schulz,et al.  An ensemble learning strategy for graph clustering , 2012, Graph Partitioning and Graph Clustering.

[7]  Santo Fortunato,et al.  Multiresolution Consensus Clustering in Networks , 2017, Scientific Reports.

[8]  David McLean,et al.  Cluster Analysis of Twitter Data: A Review of Algorithms , 2017, ICAART.

[9]  Ioannis Pitas Graph-Based Social Media Analysis , 2015 .

[10]  Elena Baralis,et al.  Analysis of Twitter Data Using a Multiple-level Clustering Strategy , 2013, MEDI.

[11]  Santo Fortunato,et al.  Finding Statistically Significant Communities in Networks , 2010, PloS one.

[12]  Changsheng Xu,et al.  Twitter is Faster: Personalized Time-Aware Video Recommendation from Twitter to YouTube , 2015, TOMM.

[13]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[14]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[15]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[16]  Srinivasan Parthasarathy,et al.  An ensemble framework for clustering protein-protein interaction networks , 2007, ISMB/ECCB.

[17]  Fernando Batista,et al.  Twitter gender classification using user unstructured information , 2015, 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[18]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[19]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[20]  Yuhua Qian,et al.  Clustering ensemble based on sample's stability , 2019, Artif. Intell..

[21]  Santo Fortunato,et al.  Fast consensus clustering in complex networks , 2019, Physical review. E.

[22]  Joshua Zhexue Huang,et al.  Stratified feature sampling method for ensemble clustering of high dimensional data , 2015, Pattern Recognit..

[23]  Jitao Sang User-centric Social Multimedia Computing , 2014, Springer Theses.

[24]  Hae-Sang Park,et al.  A simple and fast algorithm for K-medoids clustering , 2009, Expert Syst. Appl..

[25]  Ignacio Marín,et al.  Jerarca: Efficient Analysis of Complex Networks Using Hierarchical Clustering , 2010, PloS one.

[26]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[27]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[28]  Rishabh Soni,et al.  Improved Twitter Sentiment Prediction through Cluster-then-Predict Model , 2015, ArXiv.

[29]  Yun Yang,et al.  Hybrid Sampling-Based Clustering Ensemble With Global and Local Constitutions , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[30]  Yun Fu,et al.  Consensus Guided Multi-View Clustering , 2018, ACM Trans. Knowl. Discov. Data.

[31]  Jiebo Luo,et al.  Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks , 2015, AAAI.

[32]  Sadaaki Miyamoto,et al.  Clustering in tweets using a fuzzy neighborhood model , 2012, 2012 IEEE International Conference on Fuzzy Systems.

[33]  Jiye Liang,et al.  Clustering ensemble selection for categorical data based on internal validity indices , 2017, Pattern Recognit..

[34]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[35]  Eric Gilbert,et al.  VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text , 2014, ICWSM.

[36]  Michael J. Cafarella,et al.  Link-Prediction Enhanced Consensus Clustering for Complex Networks , 2015, PloS one.

[37]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[38]  Miki Haseyama,et al.  [Paper] Accurate and Efficient Extraction of Hierarchical Structure ofWeb Communities forWeb Video Retrieval , 2016 .

[39]  Sandro Vega-Pons,et al.  A Survey of Clustering Ensemble Algorithms , 2011, Int. J. Pattern Recognit. Artif. Intell..

[40]  Jingsheng Lei,et al.  A clustering ensemble: Two-level-refined co-association matrix with path-based transformation , 2015, Pattern Recognit..

[41]  Junjie Wu,et al.  Spectral Ensemble Clustering , 2015, KDD.

[42]  Igor Brigadir,et al.  Event Detection in Twitter using Aggressive Filtering and Hierarchical Tweet Clustering , 2014, SNOW-DC@WWW.

[43]  Diana Purwitasari,et al.  K-medoids algorithm on Indonesian Twitter feeds for clustering trending issue as important terms in news summarization , 2015, 2015 International Conference on Information & Communication Technology and Systems (ICTS).

[44]  Vaibhavi N Patodkar,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2016 .

[45]  Miki Haseyama,et al.  Sentiment-aware personalized tweet recommendation through multimodal FFM , 2018, Multimedia Tools and Applications.

[46]  Sergio Gómez,et al.  Size reduction of complex networks preserving modularity , 2007, ArXiv.

[47]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[48]  Daniel Müllner,et al.  fastcluster: Fast Hierarchical, Agglomerative Clustering Routines for R and Python , 2013 .

[49]  Miki Haseyama,et al.  Extracting hierarchical structure of content groups from different social media platforms using multiple social metadata , 2017, Multimedia Tools and Applications.

[50]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[51]  Miki Haseyama,et al.  Extracting Hierarchical Structure of Web Video Groups Based on Sentiment-Aware Signed Network Analysis , 2017, IEEE Access.

[52]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[53]  Ana L. N. Fred,et al.  Analysis of consensus partition in cluster ensemble , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[54]  Zhong Zhou,et al.  Tweet2Vec: Character-Based Distributed Representations for Social Media , 2016, ACL.

[55]  R. Guimerà,et al.  Modularity from fluctuations in random graphs and complex networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[56]  Miki Haseyama,et al.  Tracking topic evolution via salient keyword matching with consideration of semantic broadness for Web video discovery , 2017, Multimedia Tools and Applications.

[57]  Hareton K. N. Leung,et al.  Incremental Semi-Supervised Clustering Ensemble for High Dimensional Data Clustering , 2016, IEEE Transactions on Knowledge and Data Engineering.

[58]  Pablo Jensen,et al.  Analysis of community structure in networks of correlated data. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[59]  Antonio Moreno,et al.  Unsupervised topic discovery in micro-blogging networks , 2015, Expert Syst. Appl..

[60]  MengChu Zhou,et al.  A weight-incorporated similarity-based clustering ensemble method based on swarm intelligence , 2016, Knowl. Based Syst..

[61]  Santo Fortunato,et al.  Consensus clustering in complex networks , 2012, Scientific Reports.