Boosting to Build a Large-Scale Cross-Lingual Ontology

The global knowledge sharing makes large-scale multi-lingual knowledge bases an extremely valuable resource in the Big Data era. However, current mainstream Wikipedia-based multi-lingual ontologies still face the following problems: the scarcity of non-English knowledge, the noise in the multi-lingual ontology schema relations and the limited coverage of cross-lingual owl:sameAs relations. Building a cross-lingual ontology based on other large-scale heterogenous online wikis is a promising solution for those problems. In this paper, we propose a cross-lingually boosting approach to iteratively reinforce the performance of ontology building and instance matching. Experiments output an ontology containing over 3,520,000 English instances, 800,000 Chinese instances, and over 150,000 cross-lingual instance alignments. The F1-measure improvement of Chinese instanceOf prediction achieve the highest 32%.

[1]  Peng Zhang,et al.  XLore: A Large-scale English-Chinese Bilingual Knowledge Graph , 2013, SEMWEB.

[2]  Mansur R. Kabuka,et al.  Ontology matching with semantic verification , 2009, J. Web Semant..

[3]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[4]  Simone Paolo Ponzetto,et al.  Deriving a Large-Scale Taxonomy from Wikipedia , 2007, AAAI.

[5]  Christopher D. Manning,et al.  Multiword Expression Identification with Tree Substitution Grammars: A Parsing tour de force with French , 2011, EMNLP.

[6]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[7]  Qiang Yang,et al.  A Machine Learning Approach for Instance Matching Based on Similarity Metrics , 2012, SEMWEB.

[8]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[9]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[10]  Yi Li,et al.  RiMOM: A Dynamic Multistrategy Ontology Alignment Framework , 2009, IEEE Transactions on Knowledge and Data Engineering.

[11]  Qiong Luo,et al.  Towards Ontology Learning from Folksonomies , 2009, IJCAI.

[12]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[13]  Renata Vieira,et al.  A Framework for Multilingual Ontology Mapping , 2008, LREC.

[14]  Juan-Zi Li,et al.  Cross-lingual knowledge linking across wiki knowledge bases , 2012, WWW.

[15]  Guilin Qi,et al.  Zhishi.me - Weaving Chinese Linking Open Data , 2011, SEMWEB.

[16]  Jérôme Euzenat,et al.  Ontology Matching: State of the Art and Future Challenges , 2013, IEEE Transactions on Knowledge and Data Engineering.

[17]  Oren Etzioni,et al.  TextRunner: Open Information Extraction on the Web , 2007, NAACL.

[18]  Gerhard Weikum,et al.  MENTA: inducing multilingual taxonomies from wikipedia , 2010, CIKM '10.