Be In The Know: Connecting News Articles to Relevant Twitter Conversations

In the era of data-driven journalism, data analytics can deliver tools to support journalists in connecting to new and developing news stories, e.g., as echoed in micro-blogs such as Twitter, the new citizen-driven media. In this paper, we propose a framework for tracking and automatically connecting news articles to Twitter conversations as captured by Twitter hashtags. For example, such a system could alert journalists about news that get a lot of Twitter reaction, so that they can investigate those conversations for new developments in the story, promote their article to a set of interested consumers, or discover general sentiment towards the story. Mapping articles to appropriate hashtags is nevertheless very challenging, due to different language styles used in articles versus tweets, the streaming aspect of news and tweets, as well as the user behavior when marking certain tweet-terms as hashtags. As a case-study, we continuously track the RSS feeds of Irish Times news articles and a focused Twitter stream over a two months period, and present a system that assigns hashtags to each article, based on its Twitter echo. We propose a machine learning approach for classifying and ranking article-hashtag pairs. Our empirical study shows that our system delivers high precision for this task.

[1]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[2]  Fei Wang,et al.  What Were the Tweets About? Topical Associations between Public Events and Twitter Feeds , 2012, ICWSM.

[3]  Duncan J. Watts,et al.  Everyone's an influencer: quantifying influence on twitter , 2011, WSDM '11.

[4]  Joemon M. Jose,et al.  CrowdTiles: presenting crowd-based information for event-driven information needs , 2012, CIKM '12.

[5]  Daniel M. Romero,et al.  Influence and passivity in social media , 2010, ECML/PKDD.

[6]  Antoine Boutet,et al.  What's in Your Tweets? I Know Who You Supported in the UK 2010 General Election , 2012, ICWSM.

[7]  Xuanjing Huang,et al.  Learning Topical Translation Model for Microblog Hashtag Suggestion , 2013, IJCAI.

[8]  Thomas Gottron,et al.  Bad news travel fast: a content-based analysis of interestingness on Twitter , 2011, WebSci '11.

[9]  Barry Smyth,et al.  Terms of a Feather: Content-Based News Recommendation and Discovery Using Twitter , 2011, ECIR.

[10]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[11]  Hanan Samet,et al.  TwitterStand: news in tweets , 2009, GIS.

[12]  Bart Thomee,et al.  Automatic selection of social media responses to news , 2013, KDD.

[13]  Bettina Berendt,et al.  Peddling or Creating? Investigating the Role of Twitter in News Reporting , 2011, ECIR.

[14]  Xuanjing Huang,et al.  Automatic Hashtag Recommendation for Microblogs using Topic-Specific Translation Model , 2012, COLING.

[15]  Haofen Wang,et al.  Towards Effective Event Detection, Tracking and Summarization on Microblog Data , 2011, WAIM.

[16]  Walid Magdy,et al.  Detecting Comments on News Articles in Microblogs , 2013, ICWSM.

[17]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[18]  Mounia Lalmas,et al.  Transient News Crowds in Social Media , 2013, ICWSM.

[19]  Huan Liu,et al.  Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose , 2013, ICWSM.

[20]  Lei Zhang,et al.  A Survey of Opinion Mining and Sentiment Analysis , 2012, Mining Text Data.

[21]  Miles Efron,et al.  Hashtag retrieval in a microblogging environment , 2010, SIGIR.

[22]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[23]  Walid Magdy,et al.  TweetMogaz: a news portal of tweets , 2013, SIGIR.

[24]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[25]  Yang Song,et al.  Topical Keyphrase Extraction from Twitter , 2011, ACL.

[26]  Michael S. Bernstein,et al.  Twitinfo: aggregating and visualizing microblogs for event exploration , 2011, CHI.

[27]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[28]  Ting Wang,et al.  Opinion Retrieval in Twitter , 2012, ICWSM.

[29]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[30]  Ralf Krestel,et al.  Latent dirichlet allocation for tag recommendation , 2009, RecSys '09.

[31]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[32]  A. Bruns,et al.  The use of Twitter hashtags in the formation of ad hoc publics , 2011 .

[33]  Tie-Yan Liu,et al.  A Theoretical Analysis of NDCG Type Ranking Measures , 2013, COLT.

[34]  Jeffrey Nichols,et al.  Summarizing sporting events using twitter , 2012, IUI '12.

[35]  Ana-Maria Popescu,et al.  "Dancing with the Stars, " NBA Games, Politics: An Exploration of Twitter Users' Response to Events , 2011, ICWSM.

[36]  Itai Himelboim,et al.  Birds of a Feather Tweet Together: Integrating Network and Content Analyses to Examine Cross-Ideology Exposure on Twitter , 2013, J. Comput. Mediat. Commun..

[37]  Bernardo A. Huberman,et al.  The Pulse of News in Social Media: Forecasting Popularity , 2012, ICWSM.

[38]  Deepayan Chakrabarti,et al.  Event Summarization Using Tweets , 2011, ICWSM.

[39]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[40]  Michael Gamon,et al.  Predicting Responses to Microblog Posts , 2012, NAACL.

[41]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[42]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.