Linking Named Entities to Any Database

Existing techniques for disambiguating named entities in text mostly focus on Wikipedia as a target catalog of entities. Yet for many types of entities, such as restaurants and cult movies, relational databases exist that contain far more extensive information than Wikipedia. This paper introduces a new task, called Open-Database Named-Entity Disambiguation (Open-DB NED), in which a system must be able to resolve named entities to symbols in an arbitrary database, without requiring labeled data for each new database. We introduce two techniques for Open-DB NED, one based on distant supervision and the other based on domain adaptation. In experiments on two domains, one with poor coverage by Wikipedia and the other with near-perfect coverage, our Open-DB NED strategies outperform a state-of-the-art Wikipedia NED system by over 25% in accuracy.

[1]  David Yarowsky,et al.  Unsupervised Personal Name Disambiguation , 2003, CoNLL.

[2]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[3]  David Yarowsky,et al.  HLTCOE Approaches to Knowledge Base Population at TAC 2009 , 2009, TAC.

[4]  Ying Chen,et al.  Towards Robust Unsupervised Personal Name Disambiguation , 2007, EMNLP-CoNLL.

[5]  Daumé,et al.  Frustratingly Easy Semi-Supervised Domain Adaptation , 2010 .

[6]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[7]  Andrew McCallum,et al.  Generalized Expectation Criteria for Bootstrapping Extractors using Record-Text Alignment , 2009, EMNLP.

[8]  Ravi Kumar,et al.  Object matching in tweets with spatial models , 2012, WSDM '12.

[9]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[10]  Doug Downey,et al.  Locating Complex Named Entities in Web Text , 2007, IJCAI.

[11]  Andrew McCallum,et al.  Learning Extractors from Unlabeled Text using Relevant Databases , 2007 .

[12]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[13]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[14]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[15]  Daniel S. Weld,et al.  Using Wikipedia to bootstrap open information extraction , 2009, SGMD.

[16]  Andrew McCallum,et al.  Collective Cross-Document Relation Extraction Without Labelled Data , 2010, EMNLP.

[17]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[18]  Mark Steedman,et al.  Lexical Generalization in CCG Grammar Induction for Semantic Parsing , 2011, EMNLP.

[19]  Abraham Silberschatz,et al.  Database System Concepts , 1980 .

[20]  Ravi Kumar,et al.  Matching Reviews to Objects using a Language Model , 2009, EMNLP.

[21]  S. Soderland,et al.  - based Named Entity Disambiguation to Arbitrary Web Text , 2009 .

[22]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[23]  Xianpei Han,et al.  Named entity disambiguation by leveraging wikipedia semantic knowledge , 2009, CIKM.

[24]  Patrick Pantel,et al.  Jigs and Lures: Associating Web Queries with Structured Entities , 2011, ACL.

[25]  Alexander Yates,et al.  Distributional Representations for Handling Sparsity in Supervised Sequence-Labeling , 2009, ACL.

[26]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[27]  Andrew McCallum,et al.  Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.

[28]  Luke S. Zettlemoyer,et al.  Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations , 2011, ACL.

[29]  Lan Nie,et al.  Resolving Surface Forms to Wikipedia Topics , 2010, COLING.

[30]  Oren Etzioni,et al.  Entity Linking at Web Scale , 2012, AKBC-WEKEX@NAACL-HLT.

[31]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.

[32]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[33]  Razvan C. Bunescu,et al.  Learning to Extract Relations from the Web using Minimal Supervision , 2007, ACL.