What's in this paper?: Combining Rhetorical Entities with Linked Open Data for Semantic Literature Querying

Finding research literature pertaining to a task at hand is one of the essential tasks that scientists face on daily basis. Standard information retrieval techniques allow to quickly obtain a vast number of potentially relevant documents. Unfortunately, the search results then require significant effort for manual inspection, where we would rather select relevant publications based on more fine-grained, semantically rich queries involving a publication's contributions, methods, or application domains. We argue that a novel combination of three distinct methods can significantly advance this vision: (i) Natural Language Processing (NLP) for Rhetorical Entity (RE) detection; (ii) Named Entity (NE) recognition based on the Linked Open Data (LOD) cloud; and (iii) automatic generation of RDF triples for both NEs and REs using semantic web ontologies to interconnect them. Combined in a single workflow, these techniques allow us to automatically construct a knowledge base that facilitates numerous advanced use cases for managing scientific documents.

[1]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[2]  Amanda Clare,et al.  An ontology for a Robot Scientist , 2006, ISMB.

[3]  Kalina Bontcheva,et al.  Text Processing with GATE , 2011 .

[4]  Simone Teufel Towards Discipline-Independent Argumentative Zoning : Evidence from Chemistry and Computational Linguistics , 2009 .

[5]  Simone Teufel,et al.  Corpora for the Conceptualisation and Zoning of Scientific Papers , 2010, LREC.

[6]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[7]  Rudi Studer,et al.  The Semantic Web: Research and Applications , 2004, Lecture Notes in Computer Science.

[8]  Catherine Blake,et al.  Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles , 2010, J. Biomed. Informatics.

[9]  Simone Teufel,et al.  The Structure of Scientific Articles - Applications to Citation Indexing and Summarization , 2010, CSLI Studies in Computational Linguistics.

[10]  Tim Berners-Lee,et al.  Linked data , 2020, Semantic Web for the Working Ontologist.

[11]  Angelo Di Iorio,et al.  Towards markup support for full GODDAGs and beyond: the EARMARK approach , 2009 .

[12]  Pablo N. Mendes,et al.  Improving efficiency and accuracy in multilingual entity extraction , 2013, I-SEMANTICS '13.

[13]  Sören Auer,et al.  AGDISTIS - Graph-Based Disambiguation of Named Entities Using Linked Data , 2014, International Semantic Web Conference.

[14]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[15]  Siegfried Handschuh,et al.  SALT - Semantically Annotated LaTeX for scientific publications , 2007 .

[16]  Simone Teufel,et al.  Towards Domain-Independent Argumentative Zoning: Evidence from Chemistry and Computational Linguistics , 2009, EMNLP.

[17]  Maria Liakata,et al.  Guidelines for the annotation of General Scientific Concepts (GSC) , 2008 .

[18]  Siegfried Handschuh,et al.  KonneXSALT: First Steps Towards a Semantic Claim Federation Infrastructure , 2008, ESWC.

[19]  Esma Aïmeur,et al.  Papyres: A Research Paper Management System , 2008, 2008 10th IEEE Conference on E-Commerce Technology and the Fifth IEEE Conference on Enterprise Computing, E-Commerce and E-Services.

[20]  Bahar Sateli,et al.  Supporting Researchers with a Semantic Literature Management Wiki , 2014, SePublica.

[21]  Gerhard Weikum,et al.  AIDA: An Online Tool for Accurate Disambiguation of Named Entities in Text and Tables , 2011, Proc. VLDB Endow..

[22]  Daniel Marcu,et al.  A Decision-Based Approach to Rhetorical Parsing , 1999, ACL.

[23]  David M. Shotton,et al.  Adventures in Semantic Publishing: Exemplar Semantic Enhancements of a Research Article , 2009, PLoS Comput. Biol..

[24]  Eero Hyvönen,et al.  Publishing and Using Cultural Heritage Linked Data on the SemanticWeb.In: A Publication in the Morgan & Claypool Publishers series, SYNTHESIS LECTURES ON SEMANTIC WEB: THEORY AND TECHNOLOGY , 2012 .

[25]  Dietrich Rebholz-Schuhmann,et al.  Automatic recognition of conceptualization zones in scientific articles and two life science applications , 2012, Bioinform..

[26]  Martin Hofmann-Apitius,et al.  ‘HypothesisFinder:’ A Strategy for the Detection of Speculative Statements in Scientific Text , 2013, PLoS Comput. Biol..