Am I typing fresh tweets: Detecting up-to-dateness and worth of categorical information in microblogs

We compare bot & normal user tweets and check their up-to-dateness by using RSS feeds.We evaluate the relevance of tweet contents to the categorical taxonomies of users.Longer tweets of bot users reflect their categories better than normal users.Referring to RSS news feeds, content provided by bot users is more up-to-date. Microblogs are one of the most popular social network areas where users share their opinions, daily activities, interests or other user content. As microblogs generally pose the user's interests, the field of interests can be extracted by using the presented content. In this study, we group microblog users as normal or bot depending on their supplied content and evaluate the user groups with respect to how well they reflect their categories with fresh entries, essentially by using content mining. Traditional content mining studies do not evaluate whether the supplied user entries are up-to-date or not. Unlike similar studies, we check up-to-dateness of users' content by simultaneously retrieving user entries and RSS news feeds. If a term of user content is absent in the feature set that is formed by RSS news feeds, it is not regarded as a feature to check the freshness of the content. For each user group, we divide users into predefined categories and inspect how well the group users post relevant entries while checking the up-to-dateness of their content. Our experimental results prove that bot users always post fresher and category-relevant entries. Finally, we visualize the categorization performances of each user group's entries with Cobweb. The Cobweb presentation unveils the miscategorization tendencies of the user groups.

[1]  Ece Aksu Degirmencioglu,et al.  Exploring Area-Specific Microblogging Social Networks , 2010 .

[2]  Hiroshi Nakagawa,et al.  Applying cascaded feature selection to SVM text categorization , 2002, Proceedings. 13th International Workshop on Database and Expert Systems Applications.

[3]  Hsinchun Chen,et al.  Evaluating text visualization for authorship analysis , 2014, Security Informatics.

[4]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[5]  Jörg Kindermann,et al.  Text Categorization with Support Vector Machines. How to Represent Texts in Input Space? , 2002, Machine Learning.

[6]  Jian Xu,et al.  Social network user influence sense-making and dynamics prediction , 2014, Expert Syst. Appl..

[7]  Paolo Rosso,et al.  On the Difficulty of Clustering Microblog Texts for Online Reputation Management , 2011, WASSA@ACL.

[8]  David Yarowsky,et al.  Classifying latent user attributes in twitter , 2010, SMUC '10.

[9]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[10]  Fuji Ren,et al.  Predicting User-Topic Opinions in Twitter with Social and Topical Context , 2013, IEEE Transactions on Affective Computing.

[11]  Hae-Chang Rim,et al.  Identifying interesting Twitter contents using topical analysis , 2014, Expert Syst. Appl..

[12]  Kenji Araki,et al.  Text Normalization in Social Media: Progress, Problems and Applications for a Pre-Processing System of Casual English , 2011 .

[13]  Hakan Ferhatosmanoglu,et al.  Short text classification in twitter to improve information filtering , 2010, SIGIR.

[14]  Andrew E. Smith,et al.  The filtered words test and the influence of lexicality. , 2014, Journal of speech, language, and hearing research : JSLHR.

[15]  Stephanie Chua The Role of Parts-of-Speech in Feature Selection , 2008 .

[16]  Chew Lim Tan,et al.  A comprehensive comparative study on term weighting schemes for text categorization with support vector machines , 2005, WWW '05.

[17]  Barry Smyth,et al.  Using twitter to recommend real-time topical news , 2009, RecSys '09.

[18]  Li Weimin,et al.  A public opinion classification algorithm based on micro-blog text sentiment intensity: Design and implementation , 2011 .

[19]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[20]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[21]  Stefan Bosshart,et al.  Amateurs striving for news production. Can they compete with professional journalism , 2013 .

[22]  Miles Efron,et al.  Information search and retrieval in microblogs , 2011, J. Assoc. Inf. Sci. Technol..

[23]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[24]  Mohamed M. Mostafa,et al.  More than words: Social networks' text mining for consumer brand sentiments , 2013, Expert Syst. Appl..

[25]  George Papadakis,et al.  Content vs. context for sentiment analysis: a comparative analysis over microblogs , 2012, HT '12.

[26]  Aoying Zhou,et al.  Towards modeling popularity of microblogs , 2013, Frontiers of Computer Science.

[27]  Banu Diri,et al.  Content Mining of Microblogs , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[28]  Uffe Kock Wiil,et al.  Criminal network investigation , 2014, Security Informatics.

[29]  M. McPherson,et al.  Birds of a Feather: Homophily in Social Networks , 2001 .

[30]  Bin Liang,et al.  Searching for people to follow in social networks , 2014, Expert Syst. Appl..

[31]  Adem Karahoca,et al.  Extended topology based recommendation system for unidirectional social networks , 2015, Expert Syst. Appl..

[32]  Hua Xu,et al.  Text-based emotion classification using emotion cause extraction , 2014, Expert Syst. Appl..

[33]  Miles Efron,et al.  Query expansion and dimensionality reduction: Notions of optimality in Rocchio relevance feedback and latent semantic indexing , 2008, Inf. Process. Manag..

[34]  Ana-Maria Popescu,et al.  A Machine Learning Approach to Twitter User Classification , 2011, ICWSM.

[35]  Ok-Ran Jeong SNS-based recommendation mechanisms for social media , 2014, Multimedia Tools and Applications.

[36]  Banu Diri,et al.  Visualization and analysis of classifiers performance in multi-class medical data , 2008, Expert Syst. Appl..

[37]  Patric R. Spence,et al.  Is that a bot running the social media feed? Testing the differences in perceptions of communication quality for a human agent and a bot agent on Twitter , 2014, Comput. Hum. Behav..

[38]  Barry Smyth,et al.  Further experiments in micro-blog categorization , 2011 .

[39]  Barry Smyth,et al.  Further experiments in Microblog Categorization , .