Tweets clustering using latent semantic analysis

Social media are becoming overloaded with information due to the increasing number of information feeds. Unlike other social media, Twitter users are allowed to broadcast a short message called as ‘tweet”. In this study, we extract tweets related to MH370 for certain of time. In this paper, we present overview of our approach for tweets clustering to analyze the users’ responses toward tragedy of MH370. The tweets were clustered based on the frequency of terms obtained from the classification process. The method we used for the text classification is Latent Semantic Analysis. As a result, there are two types of tweets that response to MH370 tragedy which is emotional and non-emotional. We show some of our initial results to demonstrate the effectiveness of our approach.

[1]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[2]  Anthony Stefanidis,et al.  #Earthquake: Twitter as a Distributed Sensor System , 2013, Trans. GIS.

[3]  P. Holland,et al.  Transitivity in Structural Models of Small Groups , 1971 .

[4]  Jianyong Wang,et al.  We can learn your #hashtags: Connecting tweets to explicit topics , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[5]  Anatole Gershman,et al.  Topical Clustering of Tweets , 2011 .

[6]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[7]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[8]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[9]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[10]  V Korde,et al.  TEXT CLASSIFICATION AND CLASSIFIERS: A SURVEY , 2012 .

[11]  Felix Naumann,et al.  Analyzing and predicting viral tweets , 2013, WWW.

[12]  B. Chae,et al.  Insights from hashtag #supplychain and Twitter Analytics: Considering Twitter and Twitter data for supply chain practice and research , 2015 .

[13]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[14]  Peter W. Foltz,et al.  Latent semantic analysis for text-based research , 1996 .

[15]  Michael B. W. Wolfe,et al.  Use of latent semantic analysis for predicting psychological phenomena: Two issues and proposed solutions , 2003, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[16]  Harith Alani,et al.  Semantic Sentiment Analysis of Twitter , 2012, SEMWEB.

[17]  Huan Liu,et al.  Twitter Data Analytics , 2013, SpringerBriefs in Computer Science.

[18]  Misako Takayasu,et al.  Rumor Diffusion and Convergence during the 3.11 Earthquake: A Twitter Case Study , 2015, PloS one.

[19]  Christopher Cheong,et al.  Social Media Data Mining: A Social Network Analysis Of Tweets During The 2010-2011 Australian Floods , 2011, PACIS.

[20]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[21]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[22]  Son Doan,et al.  Enhancing Twitter Data Analysis with Simple Semantic Filtering: Example in Tracking Influenza-Like Illnesses , 2012, 2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology.

[23]  Amy J. C. Trappey,et al.  Development of a patent document classification and search platform using a back-propagation network , 2006, Expert Syst. Appl..

[24]  Anna-Lan Huang,et al.  Similarity Measures for Text Document Clustering , 2008 .

[25]  Yoram Louzoun,et al.  Mid size cliques are more common in real world networks than triangles , 2014, Network Science.