Transfer Learning Based Cross-lingual Knowledge Extraction for Wikipedia

Wikipedia infoboxes are a valuable source of structured knowledge for global knowledge sharing. However, infobox information is very incomplete and imbalanced among the Wikipedias in different languages. It is a promising but challenging problem to utilize the rich structured knowledge from a source language Wikipedia to help complete the missing infoboxes for a target language. In this paper, we formulate the problem of cross-lingual knowledge extraction from multilingual Wikipedia sources, and present a novel framework, called WikiCiKE, to solve this problem. An instancebased transfer learning method is utilized to overcome the problems of topic drift and translation errors. Our experimental results demonstrate that WikiCiKE outperforms the monolingual knowledge extraction method and the translation-based method.

[1]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[2]  Jeff Z. Pan,et al.  RDFS Reasoning on Massively Parallel Hardware , 2012, International Semantic Web Conference.

[3]  Dunja Mladenic,et al.  Proceedings of the 3rd international workshop on Link discovery , 2005, KDD 2005.

[4]  Antonio Toral,et al.  Exploiting Wikipedia and EuroWordNet to solve Cross-Lingual Question Answering , 2009, Inf. Sci..

[5]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[6]  Yi Li,et al.  RiMOM: A Dynamic Multistrategy Ontology Alignment Framework , 2009, IEEE Transactions on Knowledge and Data Engineering.

[7]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[8]  Oren Etzioni,et al.  TextRunner: Open Information Extraction on the Web , 2007, NAACL.

[9]  Bogdan Sacaleanu,et al.  Working Notes for the CLEF 2008 Workshop , 2008 .

[10]  Erhard Rahm,et al.  Schema and ontology matching with COMA++ , 2005, SIGMOD '05.

[11]  Jeff Z. Pan,et al.  Approximating OWL-DL Ontologies , 2007, AAAI.

[12]  Jeff Z. Pan,et al.  Scalable OWL 2 Reasoning for Linked Data , 2011, Reasoning Web.

[13]  Andreas Hotho,et al.  Information Retrieval in Folksonomies: Search and Ranking , 2006, ESWC.

[14]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[15]  Qiang Yang,et al.  Can chinese web pages be classified with english data source? , 2008, WWW.

[16]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[17]  Maarten de Rijke,et al.  Finding Similar Sentences across Multiple Languages in Wikipedia , 2006 .

[18]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[19]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[20]  Juan-Zi Li,et al.  Cross-lingual knowledge linking across wiki knowledge bases , 2012, WWW.

[21]  David L. Martin,et al.  Semantic Web Services , 2012, Springer Berlin Heidelberg.

[22]  Jeff Z. Pan,et al.  Reasoning about uncertain information and conflict resolution through trust revision , 2013, AAMAS.

[23]  Gosse Bouma,et al.  Cross-lingual Alignment and Completion of Wikipedia Templates , 2009 .

[24]  Markus Krötzsch,et al.  Semantic Wikipedia , 2006, WikiSym '06.

[25]  Gosse Bouma,et al.  Question Answering with Joost at CLEF 2007 , 2007, CLEF.

[26]  James Fogarty,et al.  Intelligence in Wikipedia , 2008, AAAI.

[27]  Aidan Finn,et al.  Multi-level Boundary Classification for Information Extraction , 2004, ECML.

[28]  Daniel S. Weld,et al.  Autonomously semantifying wikipedia , 2007, CIKM '07.

[29]  Ian Horrocks,et al.  RDFS(FA): Connecting RDF(S) and OWL DL , 2007, IEEE Transactions on Knowledge and Data Engineering.

[30]  Juliana Freire,et al.  Multilingual Schema Matching for Wikipedia Infoboxes , 2011, Proc. VLDB Endow..

[31]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[32]  Claudio Giuliano,et al.  Evaluation of machine learning-based information extraction algorithms: criticisms and recommendations , 2008, Lang. Resour. Evaluation.

[33]  Ian Horrocks,et al.  OWL-Eu: Adding Customised Datatypes into OWL , 2005, ESWC.

[34]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[35]  Xinfeng Zhang,et al.  A Weighted Hyper-Sphere SVM , 2009, 2009 Fifth International Conference on Natural Computation.

[36]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[37]  M. de Rijke,et al.  Discovering missing links in Wikipedia , 2005, LinkKDD '05.

[38]  Daniel S. Weld,et al.  Open Information Extraction Using Wikipedia , 2010, ACL.

[39]  Tim Berners-Lee,et al.  Linked data on the web (LDOW2008) , 2008, WWW.

[40]  Jeff Z. Pan,et al.  Querying Linked Ontological Data through Distributed Summarization , 2012, AAAI.

[41]  Michael Skinner,et al.  Information arbitrage across multi-lingual Wikipedia , 2009, WSDM '09.

[42]  Daniel S. Weld,et al.  Information extraction from Wikipedia: moving down the long tail , 2008, KDD.