From Publications to Knowledge Graphs

We address the task of compiling structured documentation of research processes in the form of knowledge graphs by automatically extracting information from publications and associating it with information from other sources. This challenge has not been previously addressed at the level described here. We have developed a process and a system that leverages existing information from DBpedia, retrieves articles from repositories, extracts and interrelates various kinds of named and non-named entities by exploiting article metadata, the structure of text as well as syntactic, lexical and semantic constraints, and populates a knowledge base in the form of RDF triples. An ontology designed to represent scholarly practices is driving the whole process. Rule -based and machine learning- based methods that account for the nature of scientific texts and a wide variety of writing styles have been developed for the task. Evaluation on datasets from three disciplines, Digital Humanities, Bioinformatics, and Medicine, shows very promising performance.

[1]  Behrang Q. Zadeh,et al.  The ACL RD-TEC 2.0: A Language Resource for Evaluating Term Extraction and Entity Recognition Methods , 2016, LREC.

[2]  Min-Yen Kan,et al.  Extracting and matching authors and affiliations in scholarly documents , 2013, JCDL '13.

[3]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[4]  Mari Ostendorf,et al.  Scientific Information Extraction with Semi-supervised Neural Tagging , 2017, EMNLP.

[5]  Allen H. Renear,et al.  Strategic Reading, Ontologies, and the Future of Scientific Publishing , 2009, Science.

[6]  Christopher D. Manning,et al.  Analyzing the Dynamics of Research by Extracting Key Aspects of Scientific Papers , 2011, IJCNLP.

[7]  Walter Daelemans,et al.  A formal framework for evaluation of information extraction , 2004 .

[8]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[9]  Alexander S. Yeh,et al.  More accurate tests for the statistical significance of result differences , 2000, COLING.

[10]  Subbarao Kambhampati,et al.  Extracting Action Sequences from Texts Based on Deep Reinforcement Learning , 2018, IJCAI.

[11]  Paul Buitelaar,et al.  Ontology-based information extraction and integration from heterogeneous data sources , 2008, Int. J. Hum. Comput. Stud..

[12]  Bahar Sateli,et al.  What's in this paper?: Combining Rhetorical Entities with Linked Open Data for Semantic Literature Querying , 2015, WWW.

[13]  Ion Androutsopoulos,et al.  Extracting contract elements , 2017, ICAIL.

[14]  Dietrich Rebholz-Schuhmann,et al.  Using argumentation to extract key sentences from biomedical abstracts , 2007, Int. J. Medical Informatics.

[15]  Vayianos Pertsas,et al.  Ontology Driven Extraction of Research Processes , 2018, SEMWEB.

[16]  Yuen-Hsien Tseng,et al.  The NTNU System at SemEval-2017 Task 10: Extracting Keyphrases and Relations from Scientific Publications Using Multiple Conditional Random Fields , 2017, SemEval@ACL.

[17]  João Fernando Ferreira,et al.  Framer: Planning Models from Natural Language Action Descriptions , 2017, ICAPS.

[18]  Enrico Motta,et al.  TechMiner: Extracting Technologies from Academic Publications , 2016, EKAW.

[19]  Yoav Goldberg,et al.  A Primer on Neural Network Models for Natural Language Processing , 2015, J. Artif. Intell. Res..

[20]  Sebastian Hellmann,et al.  Real-Time RDF Extraction from Unstructured Data Streams , 2013, SEMWEB.

[21]  Bahar Sateli,et al.  Semantic representation of scientific literature: bringing claims, contributions and named entities onto the Linked Open Data cloud , 2015, PeerJ Comput. Sci..

[22]  Violaine Prince,et al.  Ontology Population via NLP Techniques in Risk Management , 2008 .

[23]  Vayianos Pertsas,et al.  Scholarly Ontology: modelling scholarly practices , 2017, International Journal on Digital Libraries.

[24]  Matthew R. Walter,et al.  Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences , 2015, AAAI.

[25]  Benoît Sagot,et al.  Population of a Knowledge Base for News Metadata from Unstructured Text and Web Data , 2012, AKBC-WEKEX@NAACL-HLT.

[26]  Vayianos Pertsas,et al.  Ontology-Driven Information Extraction from Research Publications , 2018, TPDL.

[27]  Lutz Bornmann,et al.  Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references , 2014, J. Assoc. Inf. Sci. Technol..