HarriGT: A Tool for Linking News to Science

Being able to reliably link scientific works to the newspaper articles that discuss them could provide a breakthrough in the way we rationalise and measure the impact of science on our society. Linking these articles is challenging because the language used in the two domains is very different, and the gathering of online resources to align the two is a substantial information retrieval endeavour. We present HarriGT, a semi-automated tool for building corpora of news articles linked to the scientific papers that they discuss. Our aim is to facilitate future development of information-retrieval tools for newspaper/scientific work citation linking. HarriGT retrieves newspaper articles from an archive containing 17 years of UK web content. It also integrates with 3 large external citation networks, leveraging named entity extraction, and document classification to surface relevant examples of scientific literature to the user. We also provide a tuned candidate ranking algorithm to highlight potential links between scientific papers and newspaper articles to the user, in order of likelihood. HarriGT is provided as an open source tool (http://harrigt.xyz).

[1]  Luis Gravano,et al.  Predicting the impact of scientific concepts using full‐text features , 2016, J. Assoc. Inf. Sci. Technol..

[2]  Declan Butler,et al.  Scientists: your number is up , 2012, Nature.

[3]  Finn Årup Nielsen,et al.  Scientific citations in Wikipedia , 2007, First Monday.

[4]  Jean Liu,et al.  Five challenges in altmetrics: A toolmaker's perspective , 2013 .

[5]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[6]  Lutz Bornmann,et al.  What do citation counts measure? A review of studies on citing behavior , 2008, J. Documentation.

[7]  Blaise Cronin,et al.  The citation process: The role and significance of citations in scientific communication , 1984 .

[8]  Euan A. Adie,et al.  Altmetric: enriching scholarly content with article‐level discussion and metrics , 2013, Learn. Publ..

[9]  Cornelia Caragea,et al.  CiteSeerX: AI in a Digital Library Search Engine , 2014, AI Mag..

[10]  Egon L. Willighagen,et al.  Scholia and scientometrics with Wikidata , 2017, ArXiv.

[11]  C. Lee Giles,et al.  ParsCit: an Open-source CRF Reference String Parsing Package , 2008, LREC.

[12]  E. Garfield The history and meaning of the journal impact factor. , 2006, JAMA.

[13]  Ani Nenkova,et al.  A corpus of science journalism for analyzing writing quality , 2013, Dialogue Discourse.

[14]  Maria Liakata,et al.  Measuring scientific impact beyond academia: An assessment of existing impact metrics and proposed improvements , 2017, PloS one.

[15]  P. Conrad Uses of expertise: sources, quotes, and voice in the reporting of genetics in the news , 1999 .