Mining social media for newsgathering: A review

Abstract Social media is becoming an increasingly important data source for learning about breaking news and for following the latest developments of ongoing news. This is in part possible thanks to the existence of mobile devices, which allows anyone with access to the Internet to post updates from anywhere, leading in turn to a growing presence of citizen journalism. Consequently, social media has become a go-to resource for journalists during the process of newsgathering. Use of social media for newsgathering is however challenging, and suitable tools are needed in order to facilitate access to useful information for reporting. In this paper, we provide an overview of research in data mining and natural language processing for mining social media for newsgathering. We discuss five different areas that researchers have worked on to mitigate the challenges inherent to social media newsgathering: news discovery, curation of news, validation and verification of content, newsgathering dashboards, and other tasks. We outline the progress made so far in the field, summarise the current challenges as well as discuss future directions in the use of computational journalism to assist with social media newsgathering. This review is relevant to computer scientists researching news in social media as well as for interdisciplinary researchers interested in the intersection of computer science and journalism.

[1]  Kate Starbird,et al.  Eyes on the Ground: Emerging Practices in Periscope Use during Crisis Events , 2016, ISCRAM.

[2]  Axel Schulz,et al.  I See a Car Crash: Real-Time Detection of Small Scale Incidents in Microblogs , 2013, ESWC.

[3]  Shelly Rodgers,et al.  Perceived Health Reporting Knowledge and News Gathering Practices of Health Journalists and Editors at Community Newspapers , 2017, Journal of health communication.

[4]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[5]  Hassan Sajjad,et al.  Robust Classification of Crisis-Related Data on Social Networks Using Convolutional Neural Networks , 2017, ICWSM.

[6]  C. Fuchs Social Media: A Critical Introduction , 2013 .

[7]  Ullrich K. H. Ecker,et al.  Misinformation and Its Correction , 2012, Psychological science in the public interest : a journal of the American Psychological Society.

[8]  Arkaitz Zubiaga,et al.  Newsworthiness and Network Gatekeeping on Twitter: The Role of Social Deviance , 2014, ICWSM.

[9]  Arkaitz Zubiaga,et al.  Discourse-aware rumour stance classification in social media using sequential classifiers , 2017, Inf. Process. Manag..

[10]  Cw Anderson,et al.  Towards a sociology of computational and algorithmic journalism , 2013, New Media Soc..

[11]  Fernando Diaz,et al.  Extracting information nuggets from disaster- Related messages in social media , 2013, ISCRAM.

[12]  Seth C. Lewis,et al.  Social Media and Journalism: 10 Years Later, Untangling Key Assumptions , 2019, HICSS.

[13]  Maurizio Tesconi,et al.  Impromptu Crisis Mapping to Prioritize Emergency Response , 2016, Computer.

[14]  Samhaa R. El-Beltagy,et al.  NileTMRG at SemEval-2017 Task 8: Determining Rumour and Veracity Support for Rumours on Twitter. , 2017, *SEMEVAL.

[15]  STUART E. MIDDLETON,et al.  Geoparsing and Geosemantics for Social Media: Spatiotemporal Grounding of Content Propagating Rumors to Support Trust and Veracity Analysis during Breaking News , 2016, TOIS.

[16]  P. Gloviczki Journalism in the Age of Social Media , 2015 .

[17]  Joydeep Chandra,et al.  Where should one get news updates: Twitter or Reddit , 2019, Online Soc. Networks Media.

[18]  Neil Thurman,et al.  Social Media, Surveillance, and News Work , 2017 .

[19]  Michael S. Bernstein,et al.  Twitinfo: aggregating and visualizing microblogs for event exploration , 2011, CHI.

[20]  John C. Tang,et al.  Meerkat and Periscope: I Stream, You Stream, Apps Stream for Live Streams , 2016, CHI.

[21]  Julio Gonzalo,et al.  Towards real-time summarization of scheduled events from twitter streams , 2012, HT '12.

[22]  Danushka Bollegala,et al.  Multi-tweet Summarization of Real-Time Events , 2013, 2013 International Conference on Social Computing.

[23]  Yiannis Kompatsiaris,et al.  Verifying information with multimedia content on twitter , 2017, Multimedia Tools and Applications.

[24]  M. Broersma,et al.  TWITTER AS A NEWS SOURCE , 2013 .

[25]  Stephan Winter,et al.  Testing the event witnessing status of micro-bloggers from evidence in their micro-blogs , 2017, PloS one.

[26]  T. Murata,et al.  Breaking News Detection and Tracking in Twitter , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[27]  Felice Dell'Orletta,et al.  A Linguistically-driven Approach to Cross-Event Damage Assessment of Natural Disasters from Social Media Messages , 2015, WWW.

[28]  Karen Rose,et al.  What is Twitter , 2009 .

[29]  A. Hermida From TV to Twitter: How Ambient News Became Ambient Journalism , 2010 .

[30]  Michelle M. Maresh-Fuehrer,et al.  Social media mapping innovations for crisis prevention, response, and evaluation , 2016, Comput. Hum. Behav..

[31]  Suhang Wang,et al.  Fake News Detection on Social Media: A Data Mining Perspective , 2017, SKDD.

[32]  H. Khondker Role of the New Media in the Arab Spring , 2011 .

[33]  Melody A. Bowdon Tweeting an Ethos: Emergency Messaging, Social Media, and Teaching Technical Communication , 2014 .

[34]  Arkaitz Zubiaga,et al.  Real‐time classification of Twitter trends , 2014, J. Assoc. Inf. Sci. Technol..

[35]  Marc H. Scholl,et al.  Event identification for local areas using social media streaming data , 2013, DBSocial '13.

[36]  A. Hermida Social Journalism: Exploring how Social Media is Shaping Journalism , 2012 .

[37]  Stuart E. Middleton,et al.  Real-Time Crisis Mapping of Natural Disasters Using Social Media , 2014, IEEE Intelligent Systems.

[38]  Stefan Poslad,et al.  Identifying relevant event content for real-time event detection , 2014, 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014).

[39]  Soroush Vosoughi,et al.  Automatic detection and verification of rumors on Twitter , 2015 .

[40]  Fernando Diaz,et al.  Processing Social Media Messages in Mass Emergency: Survey Summary , 2018, WWW.

[41]  Mark T. Keane,et al.  Attention to news and its dissemination on Twitter: A survey , 2018, Comput. Sci. Rev..

[42]  Michalis Vazirgiannis,et al.  An Optimization Approach for Sub-event Detection and Summarization in Twitter , 2018, ECIR.

[43]  Arkaitz Zubiaga,et al.  Detection and Resolution of Rumours in Social Media , 2017, ACM Comput. Surv..

[44]  Katherine Fink Data-Driven Sourcing: How Journalists Use Digital Search Tools to Decide What's News , 2014 .

[45]  Sarah Cohen,et al.  Computational journalism , 2011, Commun. ACM.

[46]  Leysia Palen,et al.  Learning from the crowd: Collaborative filtering techniques for identifying on-the-ground Twitterers during mass disruptions , 2012, ISCRAM.

[47]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[48]  Johan Bollen,et al.  Computational Fact Checking from Knowledge Networks , 2015, PloS one.

[49]  Sihem Amer-Yahia,et al.  MAQSA: a system for social analytics on news , 2012, SIGMOD Conference.

[50]  Paola Velardi,et al.  A topic recommender for journalists , 2018, Information Retrieval Journal.

[51]  Yiannis Kompatsiaris,et al.  Web Video Verification using Contextual Cues , 2017, MFSec@ICMR.

[52]  Stephen W. Dittmore,et al.  For Better or for Worse: The Impact of Social Media on Chinese Sports Journalists , 2017 .

[53]  Jeffrey Nichols,et al.  Summarizing sporting events using twitter , 2012, IUI '12.

[54]  Leysia Palen,et al.  Identifying and Categorizing Disaster-Related Tweets , 2016, SocialNLP@EMNLP.

[55]  Arkaitz Zubiaga,et al.  SemEval-2017 Task 8: RumourEval: Determining rumour veracity and support for rumours , 2017, *SEMEVAL.

[56]  Nitin Agarwal,et al.  What does everybody know? Identifying event-specific sources from social media , 2012, 2012 Fourth International Conference on Computational Aspects of Social Networks (CASoN).

[57]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[58]  Krishna P. Gummadi,et al.  Purple Feed: Identifying High Consensus News Posts on Social Media , 2018, AIES.

[59]  John H. Parmelee Political journalists and Twitter: Influences on norms and practices , 2013 .

[60]  Richard Fletcher,et al.  Building the ‘Truthmeter’: Training algorithms to help journalists assess the credibility of social media sources , 2020 .

[61]  Social media, a handtool for News gathering?: a case study of the Newsday and Southern Eye , 2015 .

[62]  Heng Ji,et al.  Identifying News from Tweets , 2016, NLP+CSS@EMNLP.

[63]  Tim Weninger,et al.  Discriminative predicate path mining for fact checking in knowledge graphs , 2015, Knowl. Based Syst..

[64]  Heng Ji,et al.  Curating and contextualizing Twitter stories to assist with social newsgathering , 2013, IUI '13.

[65]  Bernhard Gross Harvesting Social Media for Journalistic Purposes in the UK , 2017 .

[66]  Stefan Poslad,et al.  Adaptive Identification of Hashtags for Real-Time Event Data Collection , 2015, Recommendation and Search in Social Networks.

[67]  Yiannis Kompatsiaris,et al.  Sensing Trending Topics in Twitter , 2013, IEEE Transactions on Multimedia.

[68]  Jie Yin,et al.  Using Social Media to Enhance Emergency Situation Awareness , 2012, IEEE Intelligent Systems.

[69]  Yiannis Kompatsiaris,et al.  Social Computing for Verifying Social Media Content in Breaking News , 2018, IEEE Internet Computing.

[70]  Digital Gumshoes , 2017 .

[71]  Vassilis Kostakos,et al.  CrisisTracker: Crowdsourced social media curation for disaster awareness , 2013, IBM J. Res. Dev..

[72]  Hassan Sajjad,et al.  Rapid Classification of Crisis-Related Data on Social Networks using Convolutional Neural Networks , 2016, ICWSM 2016.

[73]  Arkaitz Zubiaga,et al.  Exploiting Geolocation, User and Temporal Information for Natural Hazards Monitoring in Twitter , 2015, Proces. del Leng. Natural.

[74]  Fred Morstatter,et al.  Finding Eyewitness Tweets During Crises , 2014, LTCSS@ACL.

[75]  Pankaj K. Agarwal,et al.  Toward Computational Fact-Checking , 2014, Proc. VLDB Endow..

[76]  Anupam Joshi,et al.  Faking Sandy: characterizing and identifying fake images on Twitter during Hurricane Sandy , 2013, WWW.

[77]  Arkaitz Zubiaga,et al.  Supporting the Use of User Generated Content in Journalistic Practice , 2017, CHI.

[78]  Paola Velardi,et al.  Capturing Users' Information and Communication Needs for the Press Officers , 2017, SoMePeAS@ECIR.

[79]  Erika Doggett,et al.  Identifying Eyewitness News-worthy Events on Twitter , 2016, SocialNLP@EMNLP.

[80]  A. Følstad,et al.  Emerging Journalistic Verification Practices Concerning Social Media , 2016 .

[81]  C. Murrell The vulture club: International newsgathering via Facebook , 2014 .

[82]  Barry Smyth,et al.  Terms of a Feather: Content-Based News Recommendation and Discovery Using Twitter , 2011, ECIR.

[83]  Megan Knight,et al.  Social Media for Journalists: Principles and Practice , 2013 .

[84]  Hanan Samet,et al.  Finding and Tracking Local Twitter Users for News Detection , 2017, SIGSPATIAL/GIS.

[85]  Kwan-Liu Ma,et al.  Breaking news on twitter , 2012, CHI.

[86]  Ke Wang,et al.  TopicSketch: Real-Time Bursty Topic Detection from Twitter , 2013, 2013 IEEE 13th International Conference on Data Mining.

[87]  Prashant Shiralkar Computational Fact Checking by Mining Knowledge Graphs. , 2017 .

[88]  Helmut Leopold,et al.  Social Media , 2012, Elektrotech. Informationstechnik.

[89]  Andrea Marchetti,et al.  EARS (earthquake alert and report system): a real time decision support system for earthquake crisis management , 2014, KDD.

[90]  Yiannis Kompatsiaris,et al.  Web and Social Media Image Forensics for News Professionals , 2021, SMN@ICWSM.

[91]  Stephan Winter,et al.  Identifying Witness Accounts from Social Media Using Imagery , 2017, ISPRS Int. J. Geo Inf..

[92]  Filippo Menczer,et al.  Hoaxy: A Platform for Tracking Online Misinformation , 2016, WWW.

[93]  Tsai-Yen Li,et al.  An Information Visualization System to Assist News Topics Exploration with Social Media , 2016, SMSociety.

[94]  Fan Yang,et al.  Automatic detection of rumor on Sina Weibo , 2012, MDS '12.

[95]  Arkaitz Zubiaga,et al.  Social media mining for journalism , 2019, Online Inf. Rev..

[96]  U. Ajay,et al.  On Summarization and Timeline Generation for Evolutionary Tweet Streams , 2016 .

[97]  Susannah Fox,et al.  Twitter and status updating , 2009 .

[98]  Michelle X. Zhou,et al.  Event detection with social media data , 2012 .

[99]  N. Newman,et al.  Identifying and Verifying News through Social Media , 2014 .

[100]  Megan Knight,et al.  Journalism as usual: The use of social media as a newsgathering tool in the coverage of the Iranian elections in 2009 , 2012 .

[101]  Aljosha Karim Schapals Social media at BBC News: the re-making of crisis reporting , 2016 .

[102]  Ralf Krestel,et al.  Tweet-Recommender: Finding Relevant Tweets for News Articles , 2015, WWW.

[103]  Zahid Rauf,et al.  Location Based Sentiment Mapping of Topics Detected in Social Media , 2019 .

[104]  Miles Osborne,et al.  Using paraphrases for improving first story detection in news and Twitter , 2012, HLT-NAACL.

[105]  Xiaomo Liu,et al.  Real-time Rumor Debunking on Twitter , 2015, CIKM.

[106]  Xiaomo Liu,et al.  Reuters Tracer: A Large Scale System of Detecting & Verifying Real-Time News Events from Twitter , 2016, CIKM.

[107]  Ido Dagan,et al.  Interactive Abstractive Summarization for Event News Tweets , 2017, EMNLP.

[108]  Yiannis Kompatsiaris,et al.  Verifying Multimedia Use at MediaEval 2016 , 2015, MediaEval.

[109]  Mor Naaman,et al.  Finding and assessing social media information sources in the context of journalism , 2012, CHI.

[110]  Barry Smyth,et al.  Using twitter to recommend real-time topical news , 2009, RecSys '09.

[111]  M. Osborne,et al.  Bieber no more : First Story Detection using Twitter and Wikipedia , 2012 .

[112]  Mor Naaman,et al.  Editorial Algorithms: Using Social Media to Discover and Report Local News , 2015, ICWSM.

[113]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[114]  Miles Osborne,et al.  Streaming First Story Detection with application to Twitter , 2010, NAACL.

[115]  Sebastian Ruder,et al.  Neural transfer learning for natural language processing , 2019 .

[116]  Stefan Poslad,et al.  Exploiting hashtags for adaptive microblog crawling , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[117]  Stephan Winter,et al.  Testing a model of witness accounts in social media , 2014, GIR '14.

[118]  M. Saldaña,et al.  Sharing the Stage , 2017 .

[119]  Xiaomo Liu,et al.  Witness Identification in Twitter , 2016, SocialNLP@EMNLP.

[120]  J. Mixter Fast , 2012 .

[121]  Radhia Toujani,et al.  Event news detection and citizens community structure for disaster management in social networks , 2019, Online Inf. Rev..

[122]  Rizal Setya Perdana What is Twitter , 2013 .

[123]  Wael Khreich,et al.  A Survey of Techniques for Event Detection in Twitter , 2015, Comput. Intell..

[124]  M. Broersma,et al.  Social Media Sourcing Practices: How Dutch Newspapers Use Tweets in Political News Coverage , 2018 .

[125]  Heng Ji,et al.  Tweet, but verify: epistemic study of information verification on Twitter , 2013, Social Network Analysis and Mining.

[126]  Arkaitz Zubiaga,et al.  Hawkes Processes for Continuous Time Sequence Classification: an Application to Rumour Stance Classification in Twitter , 2016, ACL.

[127]  Michael Cremedas,et al.  Facebook and Twitter in the Newsroom , 2012 .

[128]  Paola Velardi,et al.  What to Write? A topic recommender for journalists , 2017, NLPmJ@EMNLP.

[129]  Hila Becker,et al.  Beyond Trending Topics: Real-World Event Identification on Twitter , 2011, ICWSM.

[130]  Geert-Jan Houben,et al.  Twitcident: fighting fire with information from social web streams , 2012, WWW.

[131]  A. Hermida #JOURNALISM: Reconfiguring journalism research about Twitter, one tweet at a time , 2013 .

[132]  Bernardo A. Huberman,et al.  Trends in Social Media: Persistence and Decay , 2011, ICWSM.

[133]  Gosse Bouma,et al.  Real time discussion retrieval from twitter , 2013, WWW.

[134]  Nargis Pervin,et al.  Fast, Scalable, and Context-Sensitive Detection of Trending Topics in Microblog Post Streams , 2013, TMIS.

[135]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[136]  Yiannis Kompatsiaris,et al.  Media REVEALr: A Social Multimedia Monitoring and Intelligence System for Web Multimedia Verification , 2015, PAISI.