Measuring prominence of scientific work in online news as a proxy for impact

The impact made by a scientific paper on the work of other academics has many established metrics, including metrics based on citation counts and social media commenting. However, determination of the impact of a scientific paper on the wider society is less well established. For example, is it important for scientific work to be newsworthy? Here we present a new corpus of newspaper articles linked to the scientific papers that they describe. We find that Impact Case studies submitted to the UK Research Excellence Framework (REF) 2014 that refer to scientific papers mentioned in newspaper articles were awarded a higher score in the REF assessment. The papers associated with these case studies also feature prominently in the newspaper articles. We hypothesise that such prominence can be a useful proxy for societal impact. We therefore provide a novel baseline approach for measuring the prominence of scientific papers mentioned within news articles. Our measurement of prominence is based on semantic similarity through a graph-based ranking algorithm. We find that scientific papers with an associated REF case study are more likely to have a stronger prominence score. This supports our hypothesis that linguistic prominence in news can be used to suggest the wider non-academic impact of scientific work.

[1]  Preslav Nakov,et al.  Rotational Unit of Memory: A Novel Representation Unit for RNNs with Scalable Applications , 2019, TACL.

[2]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[3]  Maria Liakata,et al.  HarriGT: A Tool for Linking News to Science , 2018, ACL.

[4]  Simone Teufel,et al.  Argumentative zoning information extraction from scientific text , 1999 .

[5]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[6]  What Teachers Should Know About the Bootstrap: Resampling in the Undergraduate Statistics Curriculum , 2016 .

[7]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[8]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[9]  F. Massey The Kolmogorov-Smirnov Test for Goodness of Fit , 1951 .

[10]  Eneko Agirre,et al.  SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity , 2012, *SEMEVAL.

[11]  Maria Liakata,et al.  Rhetorical Classification of Anchor Text for Citation Recommendation , 2016, D Lib Mag..

[12]  E. S. Pearson,et al.  Tests for departure from normality. Empirical results for the distributions of b2 and √b1 , 1973 .

[13]  Simone Teufel,et al.  The Structure of Scientific Articles - Applications to Citation Indexing and Summarization , 2010, CSLI Studies in Computational Linguistics.

[14]  Simone Teufel,et al.  Corpora for the Conceptualisation and Zoning of Scientific Papers , 2010, LREC.

[15]  Marc Moens,et al.  Articles Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status , 2002, CL.

[16]  M. Marelli,et al.  SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment , 2014, *SEMEVAL.

[17]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[18]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[19]  Heather A. Piwowar,et al.  Altmetrics: Value all research products , 2013, Nature.

[20]  Maria Liakata,et al.  Measuring scientific impact beyond academia: An assessment of existing impact metrics and proposed improvements , 2017, PloS one.

[21]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[22]  Euan Adie,et al.  The rise of altmetrics , 2015 .

[23]  Martin Becker,et al.  A prominence-based account of temporal discourse structure , 2018, Lingua.

[24]  M. Wacha,et al.  The State of OA: A large-scale analysis of the prevalence and impact of Open Access articles , 2017 .

[25]  Maria Liakata,et al.  Partridge: An Effective System for the Automatic Cassification of the Types of Academic Papers , 2013, SGAI Conf..

[26]  Christopher Joseph Pal,et al.  Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning , 2018, ICLR.

[27]  Guoyin Wang,et al.  Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms , 2018, ACL.

[28]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[29]  Jan Snajder,et al.  Identifying Prominent Arguments in Online Debates Using Semantic Textual Similarity , 2015, ArgMining@HLT-NAACL.

[30]  Benno Stein,et al.  Computational Argumentation Quality Assessment in Natural Language , 2017, EACL.

[31]  Lutz Bornmann,et al.  Do altmetrics assess societal impact in a comparable way to case studies? An empirical test of the convergent validity of altmetrics based on data from the UK research excellence framework (REF) , 2018, J. Informetrics.

[32]  Dietrich Rebholz-Schuhmann,et al.  A Discourse-Driven Content Model for Summarising Scientific Articles Evaluated in a Complex Question Answering Task , 2013, EMNLP.

[33]  Cheng Li,et al.  Semantic Text Matching for Long-Form Documents , 2019, WWW.

[34]  Erik Velldal,et al.  Diachronic word embeddings and semantic shifts: a survey , 2018, COLING.

[35]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[36]  David A. Smith,et al.  Predicting News Coverage of Scientific Articles , 2018, ICWSM.

[37]  Simone Teufel Argumentative Zoning for Improved Citation Indexing , 2006, Computing Attitude and Affect in Text.

[38]  José Luis Ortega Reliability and accuracy of altmetric providers: a comparison among Altmetric.com, PlumX and Crossref Event Data , 2018, Scientometrics.

[39]  Dietrich Rebholz-Schuhmann,et al.  Automatic recognition of conceptualization zones in scientific articles and two life science applications , 2012, Bioinform..

[40]  A. Azzouz 2011 , 2020, City.