CUNY-UIUC-SRI TAC-KBP2011 Entity Linking System Description

In this paper we describe a joint effort by the City University of New York (CUNY), University of Illinois at Urbana-Champaign (UIUC) and SRI International at participating in the mono-lingual entity linking (MLEL) and cross-lingual entity linking (CLEL) tasks for the NIST Text Analysis Conference (TAC) Knowledge Base Population (KBP2011) track. The MLEL system is based on a simple combination of two published systems by CUNY (Chen and Ji, 2011) and UIUC (Ratinov et al., 2011). Therefore, we mainly focus on describing our new CLEL system. In addition to a baseline system based on name translation, machine translation and MLEL, we propose two novel approaches. One is based on a cross-lingual name similarity matrix, iteratively updated based on monolingual co-occurrence, and the other uses topic modeling to enhance performance. Our best systems placed 4th in mono-lingual track and 2nd in cross-lingual track.

[1]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[2]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[3]  Breck Baldwin,et al.  Algorithms for Scoring Coreference Chains , 1998 .

[4]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[5]  Min Wan,et al.  Study on topic segmenting method in automatic abstracting system , 2003, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003.

[6]  Heng Ji,et al.  Using Semantic Relations to Refine Coreference Decisions , 2005, HLT.

[7]  Maarten de Rijke,et al.  Finding Similar Sentences across Multiple Languages in Wikipedia , 2006 .

[8]  Heng Ji,et al.  Data Selection in Semi-supervised Learning for Name Tagging , 2006 .

[9]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[10]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[11]  Patrick Schone,et al.  Mining Wiki Resources for Multilingual Named Entity Recognition , 2008, ACL.

[12]  Gosse Bouma,et al.  Cross-lingual Alignment and Completion of Wikipedia Templates , 2009 .

[13]  Takahiro Hara,et al.  Improving the extraction of bilingual terminology from Wikipedia , 2009, TOMCCAP.

[14]  Michael Skinner,et al.  Information arbitrage across multi-lingual Wikipedia , 2009, WSDM '09.

[15]  Fredric C. Gey,et al.  Information Access in a Multilingual World , 2009 .

[16]  Elena Filatova Multilingual Wikipedia , Summarization , and Information Trustworthiness , 2009 .

[17]  Wen Wang,et al.  Using syntax in large-scale audio document translation , 2009, INTERSPEECH.

[18]  Seung-won Hwang,et al.  Mining Name Translations from Entity Graph Mapping , 2010, EMNLP.

[19]  Xiang Li,et al.  CUNY-BLENDER TAC-KBP2010 Entity Linking and Slot Filling System Description , 2010, TAC.

[20]  Hiroshi Nakagawa,et al.  Person name disambiguation by bootstrapping , 2010, SIGIR.

[21]  Simone Paolo Ponzetto,et al.  BabelNet: Building a Very Large Multilingual Semantic Network , 2010, ACL.

[22]  Xiang Li,et al.  Top-Down and Bottom-Up: A Combined Approach to Slot Filling , 2010, AIRS.

[23]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[24]  Gerhard Weikum,et al.  Untangling the Cross-Lingual Link Structure of Wikipedia , 2010, ACL.

[25]  Julio Gonzalo,et al.  WePS-3 Evaluation Campaign: Overview of the Web People Search Clustering and Attribute Extraction Tasks , 2010, CLEF.

[26]  Mark Dredze,et al.  Entity Disambiguation for Knowledge Base Population , 2010, COLING.

[27]  Zornitsa Kozareva,et al.  Unsupervised Name Ambiguity Resolution Using A Generative Model , 2011, ULNLP@EMNLP.

[28]  Michael Strube,et al.  HITS' Cross-lingual Entity Linking System at TAC 2011: One Model for All Languages , 2011, TAC.

[29]  Heng Ji,et al.  Collaborative Ranking: A Case Study on Entity Linking , 2011, EMNLP.

[30]  Heng Ji,et al.  Unsupervised Language-Independent Name Translation Mining from Wikipedia Infoboxes , 2011, ULNLP@EMNLP.

[31]  Xiang Li,et al.  Joint inference for cross-document information extraction , 2011, CIKM '11.

[32]  Heng Ji,et al.  Mining Name Translations from Comparable Corpora by Creating Bilingual Information Networks , 2009, BUCC@ACL/IJCNLP.

[33]  Vasudeva Varma,et al.  Language independent identification of parallel sentences using Wikipedia , 2011, WWW.

[34]  Douglas W. Oard,et al.  Cross-Language Entity Linking , 2011, IJCNLP.

[35]  Dan Roth,et al.  GLOW TAC-KBP2011 Entity Linking System , 2011, TAC.

[36]  Sean Monahan,et al.  Cross-Lingual Cross-Document Coreference with Entity Linking , 2011, TAC.

[37]  Bo Zhao,et al.  Probabilistic topic models with biased propagation on heterogeneous information networks , 2011, KDD.

[38]  Joseph Olive,et al.  Handbook of Natural Language Processing and Machine Translation: DARPA Global Autonomous Language Exploitation , 2011 .

[39]  Douglas W. Oard,et al.  Cross-Language Entity Linking in Maryland during a Hurricane , 2011, TAC.

[40]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.