Fast and Accurate Entity Linking via Graph Embedding

Entity Linking, the task of mapping ambiguous Named Entities to unique identifiers in a knowledge base, is a cornerstone of multiple Information Retrieval and Text Analysis systems. So far, no single entity linking algorithm has been able to offer the accuracy and scalability required to deal with the ever-increasing amount of data in the web and become a de-facto standard. In this paper, we propose a framework for entity linking that leverages graph embeddings to perform collective disambiguation. This framework is modular as it supports pluggable algorithms for embedding generation and candidate ranking. With our framework, we implement and evaluate a reference pipeline that uses DBpedia as knowledge base and leverages specific algorithms for fast candidate search and high-performance state-space search optimization. Compared to existing solutions, our approach offers state-of-the-art accuracy on a variety of datasets without any supervised training and provides real-time execution even when processing documents with dozens of Named Entities. Lastly, the flexibility of our framework allows adapting to a multitude of scenarios by balancing accuracy and execution time.

[1]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[2]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[3]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[4]  Michael Granitzer,et al.  Robust and Collective Entity Disambiguation through Semantic Embeddings , 2016, SIGIR.

[5]  Paolo Ferragina,et al.  From TagME to WAT: a new entity annotator , 2014, ERD '14.

[6]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[7]  Andrew McCallum,et al.  Fast and Accurate Entity Recognition with Iterated Dilated Convolutions , 2017, EMNLP.

[8]  Roberto Navigli,et al.  Entity Linking meets Word Sense Disambiguation: a Unified Approach , 2014, TACL.

[9]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[10]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[11]  Sebastian Hellmann,et al.  N³ - A Collection of Datasets for Named Entity Recognition and Disambiguation in the NLP Interchange Format , 2014, LREC.

[12]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[13]  Raphaël Troncy,et al.  GERBIL: General Entity Annotator Benchmarking Framework , 2015, WWW.

[14]  Gerhard Weikum,et al.  Discovering emerging entities with ambiguous names , 2014, WWW.

[15]  Sören Auer,et al.  AGDISTIS - Graph-Based Disambiguation of Named Entities Using Linked Data , 2014, International Semantic Web Conference.

[16]  Daniel S. Weld,et al.  Design Challenges for Entity Linking , 2015, TACL.

[17]  Jun Zhao,et al.  Collective entity linking in web text: a graph-based method , 2011, SIGIR.

[18]  Ronald Fagin,et al.  A Declarative Framework for Linking Entities , 2016, ACM Trans. Database Syst..

[19]  Robert J. Gaizauskas,et al.  Collective Named Entity Disambiguation using Graph Ranking and Clique Partitioning Approaches , 2014, COLING.

[20]  Massimiliano Ciaramita,et al.  A Scalable Gibbs Sampler for Probabilistic Entity Linking , 2014, ECIR.

[21]  Sebastian Hellmann,et al.  Real-Time RDF Extraction from Unstructured Data Streams , 2013, SEMWEB.

[22]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[23]  Mark Dredze,et al.  Entity Linking: Finding Extracted Entities in a Knowledge Base , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[24]  Yifan He,et al.  Personalized Page Rank for Named Entity Disambiguation , 2015, NAACL.

[25]  Aba-Sah Dadzie,et al.  Making Sense of Microposts (#Microposts2014) Named Entity Extraction & Linking Challenge , 2014, #MSM.

[26]  Heng Ji,et al.  Overview of TAC-KBP2015 Tri-lingual Entity Discovery and Linking , 2015, TAC.

[27]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[28]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[29]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.