Proposal for Automatic Extraction Framework of Superconductors Related Information from Scientific Literature

The automatic collection of materials information from research papers using Natural Language Processing (NLP) is highly required for rapid materials development using big data, namely materials informatics (MI). The difficulty of this automatic collection is mainly caused by the variety of expressions in the papers, a robust system with tolerance to such variety is required to be developed. In this paper, we report an ongoing interdisciplinary work to construct a system for automatic collection of superconductor-related information from scientific literature using text mining techniques. We focused on the identification of superconducting material names and their critical temperature (Tc) key property. We discuss the construction of a prototype for extraction and linking using machine learning (ML) techniques for the physical information collection. From the evaluation using 500 sample documents, we define a baseline and a direction for future improvements.

[1]  Callum J Court,et al.  Auto-generated materials database of Curie and Néel temperatures via semi-supervised relationship extraction , 2018, Scientific Data.

[2]  O. Eriksson,et al.  Possible high-temperature superconductors predicted from electronic structure and data-filtering algorithms , 2011, 1109.6935.

[3]  Masaharu Yoshioka,et al.  Framework for automatic information extraction from research papers on nanocrystal devices , 2015, Beilstein journal of nanotechnology.

[4]  Z. Hou,et al.  Data-driven exploration of new pressure-induced superconductivity in PbBi2Te4 , 2018, Science and technology of advanced materials.

[5]  K. Hamidieh A data-driven statistical model for predicting the critical temperature of a superconductor , 2018, Computational Materials Science.

[6]  Ulf Leser,et al.  ChemSpot: a hybrid system for chemical named entity recognition , 2012, Bioinform..

[7]  Laurent Romary,et al.  Automatic Extraction of TEI Structures in Digitized Lexical Resources using Conditional Random Fields , 2017 .

[8]  Iwao Hosako,et al.  Deep Learning of Superconductors I: Estimation of Critical Temperature of Superconductors Toward the Search for New Materials , 2018 .

[9]  Jöran Beel,et al.  Evaluation and Comparison of Open Source Bibliographic Reference Parsers: A Business Use Case , 2018, ArXiv.

[10]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[11]  Corey Oses,et al.  Materials Cartography: Representing and Mining Material Space Using Structural and Electronic Fingerprints , 2014, 1412.4096.