MMKG: An approach to generate metallic materials knowledge graph based on DBpedia and Wikipedia

Abstract The research and development of metallic materials are playing an important role in today’s society, and in the meanwhile lots of metallic materials knowledge is generated and available on the Web (e.g., Wikipedia) for materials experts. However, due to the diversity and complexity of metallic materials knowledge, the knowledge utilization may encounter much inconvenience. The idea of knowledge graph (e.g., DBpedia) provides a good way to organize the knowledge into a comprehensive entity network. Therefore, the motivation of our work is to generate a metallic materials knowledge graph (MMKG) using available knowledge on the Web. In this paper, an approach is proposed to build MMKG based on DBpedia and Wikipedia. First, we use an algorithm based on directly linked sub-graph semantic distance (DLSSD) to preliminarily extract metallic materials entities from DBpedia according to some predefined seed entities; then based on the results of the preliminary extraction, we use an algorithm, which considers both semantic distance and string similarity (SDSS), to achieve the further extraction. Second, due to the absence of materials properties in DBpedia, we use an ontology-based method to extract properties knowledge from the HTML tables of corresponding Wikipedia Web pages for enriching MMKG. Materials ontology is used to locate materials properties tables as well as to identify the structure of the tables. The proposed approach is evaluated by precision, recall, F1 and time performance, and meanwhile the appropriate thresholds for the algorithms in our approach are determined through experiments. The experimental results show that our approach returns expected performance. A tool prototype is also designed to facilitate the process of building the MMKG as well as to demonstrate the effectiveness of our approach.

[1]  David W. Embley,et al.  Schema Matching and Data Extraction over HTML Tables , 2001 .

[2]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[3]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[4]  Xiaoming Zhang,et al.  STSM: An Infrastructure for Unifying Steel Knowledge and Discovering New Knowledge , 2014 .

[5]  Fabien Gandon,et al.  The Semantic Web - ISWC 2015 , 2015, Lecture Notes in Computer Science.

[6]  Changhui Yan,et al.  A Graph-Based Semantic Similarity Measure for the gene Ontology , 2011, J. Bioinform. Comput. Biol..

[7]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[8]  Chunyu Wang,et al.  A novel insight into Gene Ontology semantic similarity. , 2013, Genomics.

[9]  Harald Sack,et al.  Towards exploratory video search using linked data , 2009, 2009 11th IEEE International Symposium on Multimedia.

[10]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[11]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[12]  Simon Price,et al.  International Semantic Web Conference (ISWC2009), Washington, DC , 2009 .

[13]  D. Vernon Inform , 1995, Encyclopedia of the UN Sustainable Development Goals.

[14]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[15]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[16]  Amit P. Sheth,et al.  Ontology Alignment for Linked Open Data , 2010, SEMWEB.

[17]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[18]  Xinmin Wang,et al.  A semantic similarity measure based on information distance for ontology alignment , 2014, Inf. Sci..

[20]  Kyong-Ho Lee,et al.  Extracting logical structures from HTML tables , 2008, Comput. Stand. Interfaces.