Knowledge graph construction from multiple online encyclopedias

In recent years, lots of knowledge graphs built from Wikipedia, the largest multilingual online encyclopedia, have been published on the Web to support various applications. However, since non-English data in Wikipedia are sparse, some projects work on knowledge graph construction from multiple non-English online encyclopedias, but many technical details are missing, so it is hard to reuse their frameworks or techniques. In this paper, we propose a new framework to solve knowledge graph construction from multiple online encyclopedias. The core modules are knowledge extraction and knowledge linking. Knowledge extraction consists of regular extraction, i.e., extracting targeted article contents in the whole online encyclopedias periodically, and live extraction, which only extracts the article contents of new and updated entities. Knowledge linking utilizes heuristic lightweight entity matching strategies and a semi-supervised learning method to find duplicated entities and properties from different online encyclopedias. Experimental results show that our approaches for knowledge extraction and linking outperform state-of-the-art baselines in different evaluation metrics, and our framework can generate a large-scale knowledge graph after inputting multiple online encyclopedias.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Wei Hu,et al.  A Bootstrapping Approach to Entity Linkage on the Semantic Web , 2015, J. Web Semant..

[3]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[4]  Bin Luo,et al.  Language-Independent Type Inference of the Instances from Multilingual Wikipedia , 2019, Int. J. Semantic Web Inf. Syst..

[5]  Geoffrey Zweig,et al.  Context dependent recurrent neural network language model , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[6]  Isabelle Augenstein,et al.  Statistical Knowledge Patterns: Identifying Synonymous Relations in Large Linked Datasets , 2013, International Semantic Web Conference.

[7]  J. Euzenat,et al.  Ontology Matching , 2007, Springer Berlin Heidelberg.

[8]  Bin Liang,et al.  CN-DBpedia: A Never-Ending Chinese Knowledge Extraction System , 2017, IEA/AIE.

[9]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[10]  P J Webros BACKPROPAGATION THROUGH TIME: WHAT IT DOES AND HOW TO DO IT , 1990 .

[11]  Fabian M. Suchanek,et al.  YAGO3: A Knowledge Base from Multilingual Wikipedias , 2015, CIDR.

[12]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[13]  Wei Hu,et al.  Cross-Lingual Entity Alignment via Joint Attribute-Preserving Embedding , 2017, SEMWEB.

[14]  Haofen Wang,et al.  An effective rule miner for instance matching in a web of data , 2012, CIKM.

[15]  Yuzhong Qu,et al.  A self-training approach for resolving object coreference on the semantic web , 2011, WWW.

[16]  Wei Hu,et al.  Bootstrapping Entity Alignment with Knowledge Graph Embedding , 2018, IJCAI.

[17]  Peng Zhang,et al.  XLore: A Large-scale English-Chinese Bilingual Knowledge Graph , 2013, SEMWEB.

[18]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[19]  Jens Lehmann,et al.  DBpedia Live Extraction , 2009, OTM Conferences.

[20]  Zhiyuan Liu,et al.  Iterative Entity Alignment via Joint Knowledge Embeddings , 2017, IJCAI.

[21]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[22]  Markus Nentwig,et al.  A survey of current Link Discovery frameworks , 2016, Semantic Web.

[23]  Jens Lehmann,et al.  Wombat - A Generalization Approach for Automatic Link Discovery , 2017, ESWC.

[24]  Daniel S. Weld,et al.  Autonomously semantifying wikipedia , 2007, CIKM '07.

[25]  Yanghua Xiao,et al.  How to Keep a Knowledge Base Synchronized with Its Encyclopedia Source , 2017, IJCAI.

[26]  Enrico Motta,et al.  KnoFuss: a comprehensive architecture for knowledge fusion , 2007, K-CAP '07.

[27]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[28]  L. Brown,et al.  Interval Estimation for a Binomial Proportion , 2001 .

[29]  Asunción Gómez-Pérez,et al.  Data-Driven RDF Property Semantic-Equivalence Detection Using NLP Techniques , 2016, EKAW.

[30]  Guilin Qi,et al.  A Survey of Techniques for Constructing Chinese Knowledge Graphs and Their Applications , 2018, Sustainability.

[31]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[32]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[33]  Chengjiang Li,et al.  XLORE2: Large-scale Cross-lingual Knowledge Graph Construction and Application , 2019, Data Intelligence.

[34]  Sören Auer,et al.  LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data , 2011, IJCAI.

[35]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[36]  Johanna Völker,et al.  Statistical Schema Induction , 2011, ESWC.

[37]  Martin Gaedke,et al.  Discovering and Maintaining Links on the Web of Data , 2009, SEMWEB.

[38]  Guilin Qi,et al.  Zhishi.me - Weaving Chinese Linking Open Data , 2011, SEMWEB.

[39]  Carlo Zaniolo,et al.  Multilingual Knowledge Graph Embeddings for Cross-lingual Knowledge Alignment , 2016, IJCAI.

[40]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.