Active Information Retrieval for Linking Twitter Posts with Political Debates

Users of microblogging social networks produce millions of short messages every day. Retrieving relevant information to a particular event from this sheer volume of data is not a trivial task. In this paper, we present a framework for the retrieval of Twitter posts that are relevant to a set of political debates. Our main contribution is the proposal of a set of strategies for involving the user in the retrieval process, so that by presenting to her meaningful posts to be labeled, the method achieves a noticeably higher accuracy. The correct retrieval or labeling could be provided by an external information source such as a domain expert, or simulated with an oracle. A key aspect of active retrieval methods is to request the labels of the instances that help improve the retrieval accuracy the most, while keeping the number of labeling requests to a minimum. The proposed strategies for selecting labeling requests make use of the textual content of tweets and their structural information. The experimental results show the advantages of the proposed methods and the effectiveness of the selection strategies for involving the user in the retrieval process.

[1]  Fabio Gasparetti,et al.  TREC Microblog 2012 Track: Real-Time Algorithm for Microblog Ranking Systems , 2012 .

[2]  Elena Ferrari,et al.  EgoCentric: Ego Networks for Knowledge-based Short Text Classification , 2014, CIKM.

[3]  Carlos Castillo,et al.  AIDR: artificial intelligence for disaster response , 2014, WWW.

[4]  Hava T. Siegelmann,et al.  Active Information Retrieval , 2001, NIPS.

[5]  Julio Gonzalo,et al.  Towards an Active Learning System for Company Name Disambiguation in Microblog Streams , 2013, CLEF.

[6]  M. Slaney,et al.  Locality-Sensitive Hashing for Finding Nearest Neighbors [Lecture Notes] , 2008, IEEE Signal Processing Magazine.

[7]  W. Bruce Croft,et al.  Quary Expansion Using Local and Global Document Analysis , 1996, SIGIR Forum.

[8]  Huan Liu,et al.  ActNeT: Active Learning for Networked Texts in Microblogging , 2013, SDM.

[9]  Jianwu Yang,et al.  Knowledge-Based Query Expansion in Real-Time Microblog Search , 2015, AIRS.

[10]  Ponnurangam Kumaraguru,et al.  TweetCred: Real-Time Credibility Assessment of Content on Twitter , 2014, SocInfo.

[11]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[12]  Hakan Ferhatosmanoglu,et al.  Short text classification in twitter to improve information filtering , 2010, SIGIR.

[13]  Anatole Gershman,et al.  Topical Clustering of Tweets , 2011 .

[14]  Kazuhiro Seki,et al.  Improving pseudo-relevance feedback via tweet selection , 2013, CIKM.

[15]  Alok N. Choudhary,et al.  Twitter Trending Topic Classification , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[16]  A. Gruzd,et al.  Investigating Political Polarization on Twitter: A Canadian Perspective , 2014 .

[17]  Xiaoming Zhang,et al.  A Semi-Supervised Bayesian Network Model for Microblog Topic Classification , 2012, COLING.

[18]  David A. Shamma,et al.  Tweetgeist : Can the Twitter Timeline Reveal the Structure of Broadcast Events ? , 2009 .