HITS' Monolingual and Cross-lingual Entity Linking System at TAC 2012: A Joint Approach

This paper presents HITS’ system for monolingual and cross-lingual entity linking at TAC 2012. We propose a joint system for entity disambiguation, recognition of NILs and clustering using Markov Logic. The proposed model (1) is global, i.e. a group of mentions in a text is disambiguated in one single step combining various global and local features, and (2) performs disambiguation, unknown entity detection and clustering jointly. The model for all languages is exclusively trained on English Wikipedia articles. The results achieved in the TAC monolingual and cross-lingual entity linking tasks show that our approach is competitive: our best English run achieves 8.5 percent points above median, while we outperformed all other participating systems in the Chinese cross-lingual subtask. The results for the Spanish subtask are lower due to a bug. Our unofficial Spanish results (after fixing the bug) are close to the ones of the best system.

[1]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[2]  Xianpei Han,et al.  An Entity-Topic Model for Entity Linking , 2012, EMNLP.

[3]  Rada Mihalcea,et al.  Linking Documents to Encyclopedic Knowledge , 2008, IEEE Intelligent Systems.

[4]  Pedro M. Domingos,et al.  Markov Logic: An Interface Layer for Artificial Intelligence , 2009, Markov Logic: An Interface Layer for Artificial Intelligence.

[5]  Michael Strube,et al.  Jointly Disambiguating and Clustering Concepts and Entities with Markov Logic , 2012, COLING.

[6]  Lluís Padró,et al.  FreeLing 3.0: Towards Wider Multilinguality , 2012, LREC.

[7]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[8]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[9]  Wen-Lian Hsu,et al.  Entity Disambiguation Using a Markov-Logic Network , 2011, IJCNLP.

[10]  Michael Strube,et al.  HITS' Cross-lingual Entity Linking System at TAC 2011: One Model for All Languages , 2011, TAC.

[11]  Douglas W. Oard,et al.  Building a Cross-Language Entity Linking Collection in Twenty-One Languages , 2011, CLEF.

[12]  Ted Pedersen 6 Unsupervised corpus-based methods for WSD , 2006 .

[13]  Mark Dredze,et al.  Entity Disambiguation for Knowledge Base Population , 2010, COLING.

[14]  Ted Pedersen,et al.  Unsupervised Corpus-Based Methods for WSD , 2007 .

[15]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[16]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.

[17]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[18]  Andrew McCallum,et al.  A Discriminative Hierarchical Model for Fast Coreference at Large Scale , 2012, ACL.

[19]  Sebastian Riedel Improving the Accuracy and Efficiency of MAP Inference for Markov Logic , 2008, UAI.

[20]  Mark Dredze,et al.  Streaming Cross Document Entity Coreference Resolution , 2010, COLING.

[21]  Sean Monahan,et al.  Cross-Lingual Cross-Document Coreference with Entity Linking , 2011, TAC.

[22]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[23]  Andrew McCallum,et al.  Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models , 2011, ACL.

[24]  Michael Strube,et al.  HITS' Graph-based System at the NTCIR-9 Cross-lingual Link Discovery Task , 2011, NTCIR.

[25]  Dekang Lin,et al.  Bootstrapping Path-Based Pronoun Resolution , 2006, ACL.