BUTE: bursty users tagging method estimated by time series data

Many Twitter users post tweets that are related to their particular interests. Users can also collect information by following other users. One approach clarifies user interests by tagging labels based on the users. A user tagging method is important to discover candidate users with similar interests. Typical approaches estimate user interests with terms in tweets and by applying graph theory such as following networks. In contrast, we propose a new user tagging method using the posting time series data of the number of tweets and developed the following hypothesis: Since users have interests, they will post more tweets at the time occurring the events compared with general times. Based on this hypothesis, we extract interests as burst levels from the user and hashtag time series data with Kleinberg's burst enumerating algorithm. We manage the burst levels of users as the term frequency in documents and calculate the hashtag scores for each user by three typical score calculation methods: cosine similarity, Naive Bayes, and TF-IDF. Thus, the proposed method needs no linguistic analysis which requires heavy computational resources. With our sophisticated experimental evaluations with actually active users, we demonstrate the high efficiency of our tagging methods, evaluate them using such information retrieval system evaluation metrics as expected reciprocal rank (ERR) and Q-measure, and clarify the strengths and limitations of each one. Naive Bayes and cosine similarity are especially suitable for user tagging and tag score calculation tasks.

[1]  Richard Sproat,et al.  Mining correlated bursty topic patterns from coordinated text streams , 2007, KDD '07.

[2]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[3]  Ee-Peng Lim,et al.  Finding Bursty Topics from Microblogs , 2012, ACL.

[4]  Qi He,et al.  TwitterRank: finding topic-sensitive influential twitterers , 2010, WSDM '10.

[5]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[6]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[7]  Scott Counts,et al.  Identifying topical authorities in microblogs , 2011, WSDM '11.

[8]  Hiroyuki Kitagawa,et al.  Tagging users based on Twitter lists , 2012, Int. J. Web Eng. Technol..

[9]  Hinrich Schütze,et al.  Introduction to Information Retrieval: Scoring, term weighting, and the vector space model , 2008 .

[10]  Tetsuya Sakai,et al.  New Performance Metrics Based on Multigrade Relevance: Their Application to Question Answering , 2004, NTCIR.

[11]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[12]  Gao Cong,et al.  Tagging Your Tweets: A Probabilistic Modeling of Hashtag Annotation in Twitter , 2014, CIKM.

[13]  Ke Wang,et al.  TopicSketch: Real-Time Bursty Topic Detection from Twitter , 2013, 2013 IEEE 13th International Conference on Data Mining.

[14]  Chenliang Li,et al.  Twevent: segment-based event detection from tweets , 2012, CIKM.

[15]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[16]  Tetsuji Satoh,et al.  Twitter Bursts: Analysis of their Occurrences and Classifications , 2014, ICDS 2014.

[17]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[18]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[19]  Hiroyuki Kitagawa,et al.  TURank: Twitter User Ranking Based on User-Tweet Graph Analysis , 2010, WISE.

[20]  Efthimis N. Efthimiadis,et al.  Conversational tagging in twitter , 2010, HT '10.

[21]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[22]  Noriko Kando,et al.  Time Series Topic Modeling and Bursty Topic Detection of Correlated News and Twitter , 2013, IJCNLP.

[23]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[24]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[25]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[26]  Vagelis Hristidis,et al.  ObjectRank: Authority-Based Keyword Search in Databases , 2004, VLDB.

[27]  Wei Wu,et al.  Automatic Generation of Personalized Annotation Tags for Twitter Users , 2010, NAACL.