Structuring Tweets for improving Twitter search

Spam and wildly varying documents make searching in Twitter challenging. Most Twitter search systems generally treat a Tweet as a plain text when modeling relevance. However, a series of conventions allows users to Tweet in structural ways using a combination of different blocks of texts. These blocks include plain texts, hashtags, links, mentions, etc. Each block encodes a variety of communicative intent and the sequence of these blocks captures changing discourse. Previous work shows that exploiting the structural information can improve the structured documents (e.g., web pages) retrieval. In this study we utilize the structure of Tweets, induced by these blocks, for Twitter retrieval and Twitter opinion retrieval. For Twitter retrieval, a set of features, derived from the blocks of text and their combinations, is used into a learning‐to‐rank scenario. We show that structuring Tweets can achieve state‐of‐the‐art performance. Our approach does not rely on social media features, but when we do add this additional information, performance improves significantly. For Twitter opinion retrieval, we explore the question of whether structural information derived from the body of Tweets and opinionatedness ratings of Tweets can improve performance. Experimental results show that retrieval using a novel unsupervised opinionatedness feature based on structuring Tweets achieves comparable performance with a supervised method using manually tagged Tweets. Topic‐related specific structured Tweet sets are shown to help with query‐dependent opinion retrieval.

[1]  Wei Zhang,et al.  Opinion retrieval from blogs , 2007, CIKM '07.

[2]  Craig MacDonald,et al.  An effective statistical approach to blog post opinion retrieval , 2008, CIKM '08.

[3]  Donald Metzler,et al.  USC/ISI at TREC 2011: Microblog Track , 2011, TREC.

[4]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[5]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[6]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[7]  James Lanagan,et al.  An Investigation of Term Weighting Approaches for Microblog Retrieval , 2012, ECIR.

[8]  Jong-Hyeok Lee,et al.  Improving Opinion Retrieval Based on Query-Specific Sentiment Lexicon , 2009, ECIR.

[9]  Iadh Ounis,et al.  Overview of the TREC 2011 Microblog Track , 2011, TREC.

[10]  Josef van Genabith,et al.  #hardtoparse: POS Tagging and Parsing the Twitterverse , 2011, Analyzing Microtext.

[11]  Ting Wang,et al.  Improving Twitter Retrieval by Exploiting Structural Information , 2012, AAAI.

[12]  Giorgio Gambosi,et al.  FUB, IASI-CNR, UNIVAQ at TREC 2011 Microblog Track , 2011, Text Retrieval Conference.

[13]  Ben He,et al.  Transductive Learning for Real-Time Twitter Search , 2012, ICWSM.

[14]  Giorgio Gambosi,et al.  Automatic Construction of an Opinion-Term Vocabulary for Ad Hoc Retrieval , 2008, ECIR.

[15]  Subbarao Kambhampati,et al.  RAProp: ranking tweets by exploiting the tweet/user/web ecosystem and inter-tweet agreement , 2013, AAAI.

[16]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[17]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[18]  Koji Eguchi,et al.  Sentiment Retrieval using Generative Models , 2006, EMNLP.

[19]  Michal Konkol,et al.  Named Entity Recognition , 2012 .

[20]  Berthier A. Ribeiro-Neto,et al.  Using structural information to improve search in Web collections , 2010, J. Assoc. Inf. Sci. Technol..

[21]  M. de Rijke,et al.  Incorporating Query Expansion and Quality Indicators in Searching Microblog Posts , 2011, ECIR.

[22]  Xuanjing Huang,et al.  A unified relevance model for opinion retrieval , 2009, CIKM.

[23]  Kam-Fai Wong,et al.  A Unified Graph Model for Sentence-Based Opinion Retrieval , 2010, ACL.

[24]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[25]  Miles Efron,et al.  Information search and retrieval in microblogs , 2011, J. Assoc. Inf. Sci. Technol..

[26]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[27]  Ben He,et al.  A Survey of Learning to Rank for Real-Time Twitter Search , 2012, ICPCA/SWS.

[28]  Ting Wang,et al.  Propagated Opinion Retrieval in Twitter , 2013, WISE.

[29]  Ting Wang,et al.  An effective approach to tweets opinion retrieval , 2015, World Wide Web.

[30]  Min Zhang,et al.  A generation model to unify topic relevance and lexicon-based sentiment for opinion retrieval , 2008, SIGIR '08.

[31]  Craig MacDonald,et al.  Relevance in microblogs: enhancing tweet retrieval using hyperlinked documents , 2013, OAIR.

[32]  Kazuhiro Seki,et al.  TREC 2011 Microblog Track Experiments at Kobe University , 2012, TREC.

[33]  Edleno Silva de Moura,et al.  Information Retrieval Aware Web Site Modelling and Generation , 2004, ER.

[34]  Craig MacDonald,et al.  Overview of the TREC 2007 Blog Track , 2007, TREC.

[35]  Harry Shum,et al.  An Empirical Study on Learning to Rank of Tweets , 2010, COLING.

[36]  Thomas Gottron,et al.  Searching microblogs: coping with sparsity and document quality , 2011, CIKM '11.

[37]  Wei-Ying Ma,et al.  Block-based web search , 2004, SIGIR '04.

[38]  Fabio Crestani,et al.  Investigating Learning Approaches for Blog Post Opinion Retrieval , 2009, ECIR.

[39]  Iadh Ounis,et al.  Overview of the TREC 2008 Blog Track , 2008, TREC.

[40]  Berthier A. Ribeiro-Neto,et al.  Computing block importance for searching on web sites , 2007, CIKM '07.

[41]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[42]  James P. Callan,et al.  Passage-level evidence in document retrieval , 1994, SIGIR '94.

[43]  W. Bruce Croft,et al.  Quality models for microblog retrieval , 2012, CIKM.

[44]  Valentin Jijkoun,et al.  Generating Focused Topic-Specific Sentiment Lexicons , 2010, ACL.

[45]  Ben He,et al.  Query-biased learning to rank for real-time twitter search , 2012, CIKM.

[46]  Ting Wang,et al.  Opinion Retrieval in Twitter , 2012, ICWSM.

[47]  Tiejun Zhao,et al.  Target-dependent Twitter Sentiment Classification , 2011, ACL.

[48]  Ting Wang,et al.  Who will retweet me?: finding retweeters in twitter , 2013, SIGIR.

[49]  Jun Guo,et al.  PRIS at TREC 2011 Microblog Track , 2011, TREC.

[50]  Brendan T. O'Connor,et al.  TweetMotif: Exploratory Search and Topic Summarization for Twitter , 2010, ICWSM.

[51]  Ming Zhou,et al.  Recognizing Named Entities in Tweets , 2011, ACL.

[52]  M. de Rijke,et al.  Pseudo test collections for training and tuning microblog rankers , 2013, SIGIR.

[53]  Kazuhiro Seki,et al.  Adaptive subjective triggers for opinionated document retrieval , 2009, WSDM '09.

[54]  Craig MacDonald,et al.  Overview of the TREC 2006 Blog Track , 2006, TREC.

[55]  Timothy Baldwin,et al.  Lexical Normalisation of Short Text Messages: Makn Sens a #twitter , 2011, ACL.

[56]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[57]  Brendan T. O'Connor,et al.  Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters , 2013, NAACL.

[58]  Miles Efron,et al.  Hashtag retrieval in a microblogging environment , 2010, SIGIR.

[59]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[60]  Meredith Ringel Morris,et al.  #TwitterSearch: a comparison of microblog search and web search , 2011, WSDM '11.

[61]  Muyun Yang,et al.  Feature Analysis in Microblog Retrieval Based on Learning to Rank , 2013, NLPCC.

[62]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Lei Yang,et al.  We know what @you #tag: does the dual role affect hashtag adoption? , 2012, WWW.