Cross Lingual Entity Linking with Bilingual Topic Model

Cross lingual entity linking means linking an entity mention in a background source document in one language with the corresponding real world entity in a knowledge base written in the other language. The key problem is to measure the similarity score between the context of the entity mention and the document of the candidate entity. This paper presents a general framework for doing cross lingual entity linking by leveraging a large scale and bilingual knowledge base, Wikipedia. We introduce a bilingual topic model that mining bilingual topic from this knowledge base with the assumption that the same Wikipedia concept documents of two different languages share the same semantic topic distribution. The extracted topics have two types of representation, with each type corresponding to one language. Thus both the context of the entity mention and the document of the candidate entity can be represented in a space using the same semantic topics. We use these topics to do cross lingual entity linking. Experimental results show that the proposed approach can obtain the competitive results compared with the state-of-art approach.

[1]  Jian Su,et al.  A Wikipedia-LDA Model for Entity Linking with Batch Size Changing Instance Selection , 2011, IJCNLP.

[2]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[3]  Jian Su,et al.  Entity Linking Leveraging Automatically Generated Annotation , 2010, COLING.

[4]  Wei Shen,et al.  LINDEN: linking named entities with knowledge base via semantic knowledge , 2012, WWW.

[5]  Michael Strube,et al.  HITS' Cross-lingual Entity Linking System at TAC 2011: One Model for All Languages , 2011, TAC.

[6]  Xianpei Han,et al.  Named entity disambiguation by leveraging wikipedia semantic knowledge , 2009, CIKM.

[7]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[8]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[9]  Susumu Horiguchi,et al.  Learning to classify short and sparse text & web with hidden topics from large-scale data collections , 2008, WWW.

[10]  Gregor Heinrich Parameter estimation for text analysis , 2009 .

[11]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[12]  Jian Hu,et al.  Cross lingual text classification by mining multilingual topics from wikipedia , 2011, WSDM '11.

[13]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[14]  Douglas W. Oard,et al.  Cross-Language Entity Linking in Maryland during a Hurricane , 2011, TAC.

[15]  Xianpei Han,et al.  A Generative Entity-Mention Model for Linking Entities with Knowledge Base , 2011, ACL.

[16]  Joel Nothman,et al.  Document-level Entity Linking: CMCRC at TAC 2010 , 2010, TAC.

[17]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[18]  Andrew McCallum,et al.  Polylingual Topic Models , 2009, EMNLP.

[19]  Ivan Titov,et al.  Modeling online reviews with multi-grain topic models , 2008, WWW.

[20]  Wen-Lian Hsu,et al.  Entity Disambiguation Using a Markov-Logic Network , 2011, IJCNLP.

[21]  Mark Dredze,et al.  Entity Disambiguation for Knowledge Base Population , 2010, COLING.

[22]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[23]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Dinh Phung,et al.  Journal of Machine Learning Research: Preface , 2014 .