The Largest Knowledge Graph in Materials Science Entities, Relations, and Link Prediction through Graph Representation Learning

This paper introduces MatKG, a novel graph database of key concepts in material 1 science spanning the traditional material-structure-property-processing paradigm. 2 MatKG is autonomously generated through transformer-based, large language 3 models and generates pseudo ontological schema through statistical co-occurrence 4 mapping. At present, MatKG contains over 2 million unique relationship triples 5 derived from 80,000 entities. This allows the curated analysis, querying, and 6 visualization of materials knowledge at unique resolution and scale. Further, 7 Knowledge Graph Embedding models are used to learn embedding representations 8 of nodes in the graph which are used for downstream tasks such as link prediction 9 and entity disambiguation. MatKG allows the rapid dissemination and assimilation 10 of data when used as a knowledge base, while enabling the discovery of new 11 relations when trained as an embedding model.

[1]  Erin Wetherley,et al.  Knowledge Graph Anchored Information-Extraction for Domain-Specific Insights , 2021, ArXiv.

[2]  Vineeth Venugopal,et al.  Looking through glass: Knowledge discovery from materials science literature using natural language processing , 2021, Patterns.

[3]  Deborah L. McGuinness,et al.  NanoMine: A Knowledge Graph for Nanocomposite Materials Science , 2020, SEMWEB.

[4]  S. Broderick,et al.  A picture is worth a thousand words: applying natural language processing tools for creating a quantum materials database map , 2019, MRS Communications.

[5]  Zhibo Ma,et al.  Fundamentals of TiO2 Photocatalysis: Concepts, Mechanisms, and Challenges , 2019, Advanced materials.

[6]  Olga Kononova,et al.  Unsupervised word embeddings capture latent knowledge from materials science literature , 2019, Nature.

[7]  Anubhav Jain,et al.  Named Entity Recognition and Normalization Applied to Large-Scale Information Extraction from the Materials Science Literature , 2019, J. Chem. Inf. Model..

[8]  Alán Aspuru-Guzik,et al.  ChemOS: Orchestrating autonomous experimentation , 2018, Science Robotics.

[9]  Stefanie Jegelka,et al.  Virtual screening of inorganic materials synthesis parameters with deep learning , 2017, npj Computational Materials.

[10]  Adele P. Peskin,et al.  Informatics Infrastructure for the Materials Genome Initiative , 2016 .

[11]  Guillaume Bouchard,et al.  Complex Embeddings for Simple Link Prediction , 2016, ICML.

[12]  Jianfeng Gao,et al.  Embedding Entities and Relations for Learning and Inference in Knowledge Bases , 2014, ICLR.

[13]  J. Pablo,et al.  The Materials Genome Initiative, the interplay of experiment, theory and computation , 2014 .

[14]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[15]  Kristin A. Persson,et al.  Commentary: The Materials Project: A materials genome approach to accelerating materials innovation , 2013 .

[16]  V. L. Barrio,et al.  Natural and synthetic iron oxides for hydrogen storage and purification , 2013, Journal of Materials Science.

[17]  Shuying Shen,et al.  Evaluating the state of the art in coreference resolution for electronic medical records , 2012, J. Am. Medical Informatics Assoc..

[18]  Valentin I. Spitkovsky,et al.  A Cross-Lingual Dictionary for English Wikipedia Concepts , 2012, LREC.

[19]  J. Tolédano The structure of crystalline solids , 2011 .

[20]  Sean Bechhofer,et al.  SKOS Simple Knowledge Organization System Reference , 2009 .

[21]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[22]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[23]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[24]  Junichiro Suzuki,et al.  Characterization of Te Precipitates in CdTe Crystals , 1988 .