Learning to Collectively Link Entities

Recently Kulkarni et al. [20] proposed an approach for collective disambiguation of entity mentions occurring in natural language text. Their model achieves disambiguation by efficiently computing exact MAP inference in a binary labeled Markov Random Field. Here, we build on their disambiguation model and propose an approach to jointly learn the node and edge parameters of such a model. We use a max margin framework, which is efficiently implemented using projected subgradient, for collective learning. We leverage this in an online and interactive annotation system which incrementally trains the model as data gets curated progressively. We demonstrate the usefulness of our system by manually completing annotations for a subset of the Wikipedia collection. We have made this data publicly available. Evaluation shows that learning helps and our system performs better than several other systems including that of Kulkarni et al.

[1]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.

[2]  Massimiliano Ciaramita,et al.  A framework for benchmarking entity-annotation systems , 2013, WWW.

[3]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[4]  Xianpei Han,et al.  An Entity-Topic Model for Entity Linking , 2012, EMNLP.

[5]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[6]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[7]  Andrew McCallum,et al.  An Entity Based Model for Coreference Resolution , 2009, SDM.

[8]  Ishwar K. Sethi,et al.  Confidence-based active learning , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Jun Zhao,et al.  Collective entity linking in web text: a graph-based method , 2011, SIGIR.

[10]  Ben Taskar,et al.  Online, self-supervised terrain classification via discriminatively trained submodular Markov random fields , 2008, 2008 IEEE International Conference on Robotics and Automation.

[11]  Martha Palmer,et al.  An Empirical Study of the Behavior of Active Learning for Word Sense Disambiguation , 2006, NAACL.

[12]  Hinrich Schütze,et al.  The SMAPH system for query entity recognition and disambiguation , 2014, ERD '14.

[13]  Ben Taskar,et al.  Learning associative Markov networks , 2004, ICML.

[14]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[16]  Lan Nie,et al.  Resolving Surface Forms to Wikipedia Topics , 2010, COLING.

[17]  Michael Strube,et al.  Jointly Disambiguating and Clustering Concepts and Entities with Markov Logic , 2012, COLING.

[18]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[19]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[20]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[21]  Lise Getoor,et al.  A Latent Dirichlet Model for Unsupervised Entity Resolution , 2005, SDM.

[22]  Paolo Ferragina,et al.  TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[23]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[24]  Soumen Chakrabarti,et al.  Optimizing scoring functions and indexes for proximity search in type-annotated corpora , 2006, WWW '06.

[25]  Andrew McCallum,et al.  An Integrated, Conditional Model of Information Extraction and Coreference with Appli , 2004, UAI.

[26]  Rajeev Rastogi,et al.  Entity disambiguation with hierarchical topic models , 2011, KDD.

[27]  Ming-Wei Chang,et al.  Importance of Semantic Representation: Dataless Classification , 2008, AAAI.

[28]  Ramanathan V. Guha,et al.  SemTag and seeker: bootstrapping the semantic web via automated semantic annotation , 2003, WWW '03.

[29]  Gerhard Weikum,et al.  NAGA: harvesting, searching and ranking knowledge , 2008, SIGMOD Conference.

[30]  Pararth Shah,et al.  System for collective entity disambiguation , 2014, ERD '14.

[31]  Xianpei Han,et al.  Named entity disambiguation by leveraging wikipedia semantic knowledge , 2009, CIKM.

[32]  Oren Etzioni,et al.  Entity Linking at Web Scale , 2012, AKBC-WEKEX@NAACL-HLT.

[33]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.