AI-KG: An Automatically Generated Knowledge Graph of Artificial Intelligence

Scientific knowledge has been traditionally disseminated and preserved through research articles published in journals, conference proceedings, and online archives. However, this article-centric paradigm has been often criticized for not allowing to automatically process, categorize, and reason on this knowledge. An alternative vision is to generate a semantically rich and interlinked description of the content of research publications. In this paper, we present the Artificial Intelligence Knowledge Graph (AI-KG), a large-scale automatically generated knowledge graph that describes 820K research entities. AI-KG includes about 14M RDF triples and 1.2M reified statements extracted from 333K research publications in the field of AI, and describes 5 types of entities (tasks, methods, metrics, materials, others) linked by 27 relations. AI-KG has been designed to support a variety of intelligent services for analyzing and making sense of research dynamics, supporting researchers in their daily job, and helping to inform decision-making in funding bodies and research policymakers. AI-KG has been generated by applying an automatic pipeline that extracts entities and relationships using three tools: DyGIE++, Stanford CoreNLP, and the CSO Classifier. It then integrates and filters the resulting triples using a combination of deep learning and semantic technologies in order to produce a high-quality knowledge graph. This pipeline was evaluated on a manually crafted gold standard, yielding competitive results. AI-KG is available under CC BY 4.0 and can be downloaded as a dump or queried via a SPARQL endpoint.

[1]  Stephen Roller,et al.  Hearst Patterns Revisited: Automatic Hypernym Detection from Large Text Corpora , 2018, ACL.

[2]  Hannaneh Hajishirzi,et al.  Entity, Relation, and Event Extraction with Contextualized Span Representations , 2019, EMNLP.

[3]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[4]  Johan Bos,et al.  Linguistically Motivated Large-Scale NLP with C&C and Boxer , 2007, ACL.

[5]  Mark Greenwood,et al.  OpenMinTeD: A Platform Facilitating Text Mining of Scholarly Content , 2018 .

[6]  Michael Krauthammer,et al.  Decentralized provenance-aware publishing with nanopublications , 2016, PeerJ Prepr..

[7]  Diego Reforgiato Recupero,et al.  Semantic Web Machine Reading with FRED , 2017, Semantic Web.

[8]  Behrang Q. Zadeh,et al.  SemEval-2018 Task 7: Semantic Relation Extraction and Classification in Scientific Papers , 2018, *SEMEVAL.

[9]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[10]  Andrea Giovanni Nuzzolese,et al.  Semantic Web Conference Ontology - A Refactoring Solution , 2016, ESWC.

[11]  Mari Ostendorf,et al.  Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction , 2018, EMNLP.

[12]  Jodi Schneider,et al.  Using the Micropublications Ontology and the Open Annotation Data Model to Represent Evidence within a Drug-Drug Interaction Knowledge Base , 2014, LISC@ISWC.

[13]  Francesco Osborne,et al.  Improving Editorial Workflow and Metadata Quality at Springer Nature , 2019, SEMWEB.

[14]  Carole A. Goble,et al.  The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud , 2013, Nucleic Acids Res..

[15]  Sören Auer,et al.  Open Research Knowledge Graph: Next Generation Infrastructure for Semantic Scholarly Knowledge , 2019, K-CAP.

[16]  Ana B. Rios-Alvarado,et al.  OpenIE-based approach for Knowledge Graph construction from text , 2018, Expert Syst. Appl..

[17]  C. Lee Giles,et al.  Extracting Semantic Relations for Scholarly Knowledge Base Construction , 2018, 2018 IEEE 12th International Conference on Semantic Computing (ICSC).

[18]  Roger D. Peng,et al.  The reproducibility crisis in science: A statistical counterattack , 2015 .

[19]  Diego Reforgiato Recupero,et al.  Generating Knowledge Graphs by Employing Natural Language Processing and Machine Learning Techniques within the Scholarly Domain , 2020, Future Gener. Comput. Syst..

[20]  David M. Shotton,et al.  Semantic publishing: the coming revolution in scientific journal publishing , 2009, Learn. Publ..

[21]  Christopher D. Manning,et al.  Leveraging Linguistic Structure For Open Domain Information Extraction , 2015, ACL.

[22]  Enrico Motta,et al.  The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas , 2018, SEMWEB.

[23]  Paul T. Groth,et al.  The anatomy of a nanopublication , 2010, Inf. Serv. Use.

[24]  Diego Reforgiato Recupero,et al.  Integrating Knowledge Graphs for Analysing Academia and Industry Dynamics , 2020, ADBIS/TPDL/EDA Workshops.

[25]  Lutz Bornmann,et al.  Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references , 2014, J. Assoc. Inf. Sci. Technol..

[26]  Silvio Peroni,et al.  The SPAR Ontologies , 2018, SEMWEB.

[27]  Diego Reforgiato Recupero,et al.  Mining Scholarly Publications for Scientific Knowledge Graph Construction , 2019, ESWC.

[28]  Maria-Esther Vidal,et al.  Towards a Knowledge Graph for Science , 2018, WIMS.

[29]  Bianca Kramer,et al.  Ten Hot Topics around Scholarly Publishing , 2019, Publ..

[30]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[31]  Francesco Osborne,et al.  The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly Articles , 2019, TPDL.