Analysis of Twitter Data Using a Multiple-level Clustering Strategy

Twitter, currently the leading microblogging social network, has attracted a great body of research works. This paper proposes a data analysis framework to discover groups of similar twitter messages posted on a given event. By analyzing these groups, user emotions or thoughts that seem to be associated with specific events can be extracted, as well as aspects characterizing events according to user perception. To deal with the inherent sparseness of micro-messages, the proposed approach relies on a multiple-level strategy that allows clustering text data with a variable distribution. Clusters are then characterized through the most representative words appearing in their messages, and association rules are used to highlight correlations among these words. To measure the relevance of specific words for a given event, text data has been represented in the Vector Space Model using the TF-IDF weighting score. As a case study, two real Twitter datasets have been analysed.

[1]  Young-Ho Park,et al.  Finding Core Topics: Topic Extraction with Clustering on Tweet , 2012, 2012 Second International Conference on Cloud and Green Computing.

[2]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[3]  Marc Cheong,et al.  Integrating web-based intelligence retrieval and decision-making from the twitter trends knowledge base , 2009, CIKM-SWSM.

[4]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[5]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[6]  Gerhard Weikum,et al.  Exploiting social relations for query expansion and result ranking , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[7]  Hector Garcia-Molina,et al.  Social tag prediction , 2008, SIGIR '08.

[8]  Luca Cagliero,et al.  Discovering generalized association rules from Twitter , 2013, Intell. Data Anal..

[9]  Hans-Peter Kriegel,et al.  Density-based community detection in social networks , 2011, 2011 IEEE 5th International Conference on Internet Multimedia Systems Architecture and Application.

[10]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[11]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[12]  Elena Baralis,et al.  Analysis of diabetic patients through their examination history , 2013, Expert Syst. Appl..

[13]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[14]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[15]  Rosane Minghim,et al.  Visual text mining using association rules , 2007, Comput. Graph..

[16]  Xin Li,et al.  Tag-based social interest discovery , 2008, WWW.

[17]  Gerhard Weikum,et al.  See what's enBlogue: real-time emergent topic identification in social media , 2012, EDBT '12.

[18]  Latifur Khan,et al.  Tweets mining using WIKIPEDIA and impurity cluster measurement , 2010, 2010 IEEE International Conference on Intelligence and Security Informatics.

[19]  Rui Li,et al.  Exploring social tagging graph for web object classification , 2009, KDD.

[20]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .