Analysis and Refinement of Cross-Lingual Entity Linking

In this paper we propose two novel approaches to enhance cross-lingual entity linking (CLEL). One is based on cross-lingual information networks, aligned based on monolingual information extraction, and the other uses topic modeling to ensure global consistency. We enhance a strong baseline system derived from a combination of state-of-the-art machine translation and monolingual entity linking to achieve 11.2% improvement in B-Cubed+ F-measure. Our system achieved highly competitive results in the NIST Text Analysis Conference (TAC) Knowledge Base Population (KBP2011) evaluation. We also provide detailed qualitative and quantitative analysis on the contributions of each approach and the remaining challenges.

[1]  Heng Ji,et al.  Using Semantic Relations to Refine Coreference Decisions , 2005, HLT.

[2]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[3]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[4]  Xiang Li,et al.  Joint inference for cross-document information extraction , 2011, CIKM '11.

[5]  Elena Filatova Multilingual Wikipedia , Summarization , and Information Trustworthiness , 2009 .

[6]  Wen Wang,et al.  Using syntax in large-scale audio document translation , 2009, INTERSPEECH.

[7]  Vasudeva Varma,et al.  Language independent identification of parallel sentences using Wikipedia , 2011, WWW.

[8]  Seung-won Hwang,et al.  Mining Name Translations from Entity Graph Mapping , 2010, EMNLP.

[9]  Heng Ji,et al.  Refining Event Extraction through Cross-Document Inference , 2008, ACL.

[10]  Heng Ji,et al.  Unsupervised Language-Independent Name Translation Mining from Wikipedia Infoboxes , 2011, ULNLP@EMNLP.

[11]  Julio Gonzalo,et al.  WePS-3 Evaluation Campaign: Overview of the Web People Search Clustering and Attribute Extraction Tasks , 2010, CLEF.

[12]  Zornitsa Kozareva,et al.  Unsupervised Name Ambiguity Resolution Using A Generative Model , 2011, ULNLP@EMNLP.

[13]  Michael Strube,et al.  HITS' Cross-lingual Entity Linking System at TAC 2011: One Model for All Languages , 2011, TAC.

[14]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[15]  Heng Ji,et al.  Language Specific Issue and Feature Exploration in Chinese Event Extraction , 2009, NAACL.

[16]  Takahiro Hara,et al.  Improving the extraction of bilingual terminology from Wikipedia , 2009, TOMCCAP.

[17]  Douglas W. Oard,et al.  Cross-Language Entity Linking in Maryland during a Hurricane , 2011, TAC.

[18]  Bo Zhao,et al.  Probabilistic topic models with biased propagation on heterogeneous information networks , 2011, KDD.

[19]  Heng Ji,et al.  Collaborative Ranking: A Case Study on Entity Linking , 2011, EMNLP.

[20]  Douglas W. Oard,et al.  Cross-Language Entity Linking , 2011, IJCNLP.

[21]  Sean Monahan,et al.  Cross-Lingual Cross-Document Coreference with Entity Linking , 2011, TAC.

[22]  Patrick Schone,et al.  Mining Wiki Resources for Multilingual Named Entity Recognition , 2008, ACL.

[23]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.