SciLens: Evaluating the Quality of Scientific News Articles Using Social Media and Scientific Literature Indicators

This paper describes, develops, and validates SciLens, a method to evaluate the quality of scientific news articles. The starting point for our work are structured methodologies that define a series of quality aspects for manually evaluating news. Based on these aspects, we describe a series of indicators of news quality. According to our experiments, these indicators help non-experts evaluate more accurately the quality of a scientific news article, compared to non-experts that do not have access to these indicators. Furthermore, SciLens can also be used to produce a completely automated quality score for an article, which agrees more with expert evaluators than manual evaluations done by non-experts. One of the main elements of SciLens is the focus on both content and context of articles, where context is provided by (1) explicit and implicit references on the article to scientific literature, and (2) reactions in social media referencing the article. We show that both contextual elements can be valuable sources of information for determining article quality. The validation of SciLens, done through a combination of expert and non-expert annotation, demonstrates its effectiveness for both semi-automatic and automatic quality evaluation of scientific news.

[1]  Robert West,et al.  Quootstrap: Scalable Unsupervised Extraction of Quotation-Speaker Pairs from Large News Corpora via Bootstrapping , 2018, ICWSM.

[2]  Wei Gao,et al.  From classification to quantification in tweet sentiment analysis , 2016, Social Network Analysis and Mining.

[3]  Preslav Nakov,et al.  Predicting Factuality of Reporting and Bias of News Media Sources , 2018, EMNLP.

[4]  Philip M. Newton,et al.  When Medical News Comes from Press Releases—A Case Study of Pancreatic Cancer and Processed Meat , 2015, PloS one.

[5]  Wolfgang Schweiger,et al.  News Quality from the Recipients' Perspective , 2014 .

[6]  Vincent Ng,et al.  Stance Classification of Ideological Debates: Data, Models, Features, and Constraints , 2013, IJCNLP.

[7]  Rohini K. Srihari,et al.  OpinionMiner: a novel machine learning system for web opinion mining and extraction , 2009, KDD.

[8]  Paul Meurer,et al.  Quote Extraction and Attribution from Norwegian Newspapers , 2017, NODALIDA.

[9]  Steinberger Ralf,et al.  Automatic Detection of Quotations in Multilingual News , 2007 .

[10]  A. Gross The roles of rhetoric in the public understanding of science , 1994 .

[11]  James R. Curran,et al.  Automatically Detecting and Attributing Indirect Quotations , 2013, EMNLP.

[12]  Lushan Han,et al.  Samsung: Align-and-Differentiate Approach to Semantic Textual Similarity , 2015, SemEval@NAACL-HLT.

[13]  Xiaojun Wan,et al.  Learning to Identify Ambiguous and Misleading News Headlines , 2017, IJCAI.

[14]  E. Hellinger,et al.  Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen. , 1909 .

[15]  Stefan Conrad,et al.  HHU at SemEval-2016 Task 1: Multiple Approaches to Measuring Semantic Textual Similarity , 2016, *SEMEVAL.

[16]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[17]  R. Fisher Statistical methods for research workers , 1927, Protoplasma.

[18]  Martin W. Bauer,et al.  What can we learn from 25 years of PUS survey research? Liberating and expanding the agenda , 2007 .

[19]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[20]  Kathleen McKeown,et al.  Automatic Attribution of Quoted Speech in Literary Narrative , 2010, AAAI.

[21]  G. Myers Discourse Studies of Scientific Popularization: Questioning the Boundaries , 2003 .

[22]  Jürgen Pfeffer,et al.  Characterizing the life cycle of online news stories using social media reactions , 2013, CSCW.

[23]  J. Rowley,et al.  Trust and Credibility in Web-Based Health Information: A Review and Agenda for Future Research , 2017, Journal of medical Internet research.

[24]  V. D. Semir Scientific journalism: problems and perspectives. , 2000 .

[25]  Filippo Menczer,et al.  Hoaxy: A Platform for Tracking Online Misinformation , 2016, WWW.

[26]  Sinan Aral,et al.  The spread of true and false news online , 2018, Science.

[27]  Teresa A. Myers,et al.  Public attention to science and political news and support for climate change mitigation , 2015 .

[28]  S. Dunwoody Science journalism: prospects in the digital age , 2014 .

[29]  Nagendra Kumar,et al.  Debate Stance Classification Using Word Embeddings , 2018, DaWaK.

[30]  Arkaitz Zubiaga,et al.  Discourse-aware rumour stance classification in social media using sequential classifiers , 2017, Inf. Process. Manag..

[31]  Euan A. Adie,et al.  Altmetric: enriching scholarly content with article‐level discussion and metrics , 2013, Learn. Publ..

[32]  B. J. Fogg,et al.  The elements of computer credibility , 1999, CHI '99.

[33]  Saif Mohammad,et al.  Stance and Sentiment in Tweets , 2016, ACM Trans. Internet Techn..

[34]  Yiannis Kompatsiaris,et al.  Learning to Detect Misleading Content on Twitter , 2017, ICMR.

[35]  Taher H. Haveliwala Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search , 2003, IEEE Trans. Knowl. Data Eng..

[36]  James R. Curran,et al.  A Sequence Labelling Approach to Quote Attribution , 2012, EMNLP.

[37]  Filippo Menczer,et al.  Fact-checking Effect on Viral Hoaxes: A Model of Misinformation Spread in Social Networks , 2015, WWW.

[38]  Johan Bollen,et al.  Computational Fact Checking from Knowledge Networks , 2015, PloS one.

[39]  Barbara Poblete,et al.  Predicting information credibility in time-sensitive social media , 2013, Internet Res..

[40]  Gerhard Weikum,et al.  Where the Truth Lies: Explaining the Credibility of Emerging Claims on the Web and Social Media , 2017, WWW.

[41]  David R. Karger,et al.  A Structured Response to Misinformation: Defining and Annotating Credibility Indicators in News Articles , 2018, WWW.

[42]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[43]  Suhang Wang,et al.  Fake News Detection on Social Media: A Data Mining Perspective , 2017, SKDD.

[44]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[45]  Emma. Weitkamp,et al.  British newspapers privilege health and medicine topics over other science news , 2003 .

[46]  Jakob D. Jensen Scientific Uncertainty in News Coverage of Cancer Research: Effects of Hedging on Scientists' and Journalists' Credibility , 2008 .

[47]  F. Badenschier,et al.  Issue Selection in Science Journalism: Towards a Special Theory of News Values for Science News? , 2012 .

[48]  B. Wyss Online journalism , 2018, Covering the Environment.

[49]  Deepak Agarwal,et al.  Multi-faceted ranking of news articles using post-read actions , 2012, CIKM '12.

[50]  Enrique Herrera-Viedma,et al.  Sentiment analysis: A review and comparative analysis of web services , 2015, Inf. Sci..

[51]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[52]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[53]  Jure Leskovec,et al.  Disinformation on the Web: Impact, Characteristics, and Detection of Wikipedia Hoaxes , 2016, WWW.

[54]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[55]  James C. Foust Online journalism : principles and practices of news for the Web , 2005 .

[56]  Thomas Boraud,et al.  Poor replication validity of biomedical association studies reported by newspapers , 2017, PloS one.

[57]  Angel X. Chang,et al.  A Two-stage Sieve Approach for Quote Attribution , 2017, EACL.

[58]  P. Conrad Uses of expertise: sources, quotes, and voice in the reporting of genetics in the news , 1999 .