Cross-Lingual Entity Query from Large-Scale Knowledge Graphs

A knowledge graph is a structured knowledge system which contains a huge amount of entities and relations. It plays an important role in the field of named entity query. DBpedia, YAGO and other English knowledge graphs provide open access to huge amounts of high-quality named entities. However, Chinese knowledge graphs are still in the development stage, and contain fewer entities. The relations between entities are not rich. A natural question is: how to use mature English knowledge graphs to query Chinese named entities, and to obtain rich relation networks. In this paper, we propose a Chinese entity query system based on English knowledge graphs. For entities we build up links between Chinese entities and English knowledge graphs. The basic idea is to build a cross-lingual entity linking model, RSVM, between Chinese and English Wikipedia. RSVM is used to build cross-lingual links between Chinese entities and English knowledge graphs. The experiments show that our approach can achieve a high precision of 82.3 % for the task of finding cross-lingual entities on a test dataset. Our experiments for the sub task of finding missing cross-lingual links show that our approach has a precision of 89.42 % with a recall of 80.47 %.

[1]  Sébastien Fournier,et al.  An Effective TF/IDF-Based Text-to-Text Semantic Similarity Measure for Text Classification , 2014, WISE.

[2]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[3]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[4]  Wei Shen,et al.  LINDEN: linking named entities with knowledge base via semantic knowledge , 2012, WWW.

[5]  Philipp Cimiano,et al.  Enriching the crosslingual link structure of Wikipedia - A classification-based approach , 2008, AAAI 2008.

[6]  Ming Gao,et al.  Challenges in Chinese knowledge graph construction , 2015, 2015 31st IEEE International Conference on Data Engineering Workshops.

[7]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[8]  Breck Baldwin,et al.  Entity-Based Cross-Document Coreferencing Using the Vector Space Model , 1998, COLING.

[9]  Fabian M. Suchanek,et al.  YAGO3: A Knowledge Base from Multilingual Wikipedias , 2015, CIDR.

[10]  Juan-Zi Li,et al.  Cross-lingual knowledge linking across wiki knowledge bases , 2012, WWW.

[11]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[12]  Carina Silberer,et al.  Building a Multilingual Lexical Resource for Named Entity Disambiguation, Translation and Transliteration , 2008, LREC.

[13]  Jianyong Wang,et al.  GRAPE: A Graph-Based Framework for Disambiguating People Appearances in Web Search , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[14]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[15]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[16]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[17]  Maarten de Rijke,et al.  Finding Similar Sentences across Multiple Languages in Wikipedia , 2006 .