Analyzing Relationship of Words Using Biased LexRank from Geotagged Tweets

A place is one important factor that affects the relevance between two words. For example, in places where tourists can view Tokyo Tower and cherry blossoms at the same time, it is considered that these words are related. However, in other places, these words would not be considered related. In this research, we propose a method to extract the relationship between words in an area based on this hypothesis, using geotagged texts obtained from Twitter. To extract the relevant words of a word posted at each place, our approach uses propagation of words co-occurring with co-occurring words in texts. We apply Biased LexRank, adapted from PageRank, to a graph constructed from co-occurrence relationships in each area. We also determined the places where there are two characteristic objects based on the relationship of the words. We visualize and discuss such places based on the relationship, such as places where you can appreciate "Tokyo Tower" and "cherry blossom" together.

[1]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[2]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[3]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[4]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[5]  Dragomir R. Radev,et al.  Biased LexRank: Passage retrieval using random walks with question-based priors , 2009, Inf. Process. Manag..

[6]  Kazutoshi Sumiya,et al.  Measuring geographical regularities of crowd behaviors for Twitter-based geo-social event detection , 2010, LBSN '10.

[7]  Alexander J. Smola,et al.  Discovering geographical topics in the twitter stream , 2012, WWW.

[8]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[9]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[10]  Mohammad Ali Abbasi,et al.  Real-World Behavior Analysis through a Social Media Lens , 2012, SBP.

[11]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[12]  Ichiro Kobayashi,et al.  Text Classification based on the Latent Topics of Important Sentences extracted by the PageRank Algorithm , 2013, ACL.

[13]  Fahad Bin Muhaya,et al.  Estimating Twitter User Location Using Social Interactions--A Content Based Approach , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[14]  Abdelghani Bellaachia,et al.  NE-Rank: A Novel Graph-Based Keyphrase Extraction in Twitter , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[15]  Seongjoo Lee,et al.  Discovering hot topics using Twitter streaming data social topic detection and geographic clustering , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[16]  Hansjörg Schmauder,et al.  Visual analysis of microblog content using time-varying co-occurrence highlighting in tag clouds , 2012, AVI.