Lightweight Multilingual Entity Extraction and Linking

Text analytics systems often rely heavily on detecting and linking entity mentions in documents to knowledge bases for downstream applications such as sentiment analysis, question answering and recommender systems. A major challenge for this task is to be able to accurately detect entities in new languages with limited labeled resources. In this paper we present an accurate and lightweight, multilingual named entity recognition (NER) and linking (NEL) system. The contributions of this paper are three-fold: 1) Lightweight named entity recognition with competitive accuracy; 2) Candidate entity retrieval that uses search click-log data and entity embeddings to achieve high precision with a low memory footprint; and 3) efficient entity disambiguation. Our system achieves state-of-the-art performance on TAC KBP 2013 multilingual data and on English AIDA CONLL data.

[1]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[2]  Norberto Fernández García,et al.  Comparative Evaluation of Link-Based Approaches for Candidate Ranking in Link-to-Wikipedia Systems , 2014, J. Artif. Intell. Res..

[3]  Mark Dredze,et al.  Entity Linking: Finding Extracted Entities in a Knowledge Base , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[4]  James Clarke,et al.  Basis Technology at TAC 2012 Entity Linking , 2012, TAC.

[5]  Heng Ji,et al.  Overview of TAC-KBP2015 Tri-lingual Entity Discovery and Linking , 2015, TAC.

[6]  Silviu Cucerzan,et al.  Large-Scale Named Entity Disambiguation Based on Wikipedia Data , 2007, EMNLP.

[7]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[8]  Geoffrey Zweig,et al.  Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding , 2014, INTERSPEECH.

[9]  Gang Luo,et al.  Joint Named Entity Recognition and Disambiguation , 2015 .

[10]  Giuseppe Ottaviano,et al.  Fast and Space-Efficient Entity Linking for Queries , 2015, WSDM.

[11]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[12]  Dan Roth,et al.  Relational Inference for Wikification , 2013, EMNLP.

[13]  Andrew McCallum,et al.  Lexicon Infused Phrase Embeddings for Named Entity Resolution , 2014, CoNLL.

[14]  Tiejun Zhao,et al.  Semi-supervised learning for word sense disambiguation using parallel corpora , 2011, 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[15]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.

[16]  Jason Baldridge,et al.  Twitter Polarity Classification with Label Propagation over Lexical Links and the Follower Graph , 2011, ULNLP@EMNLP.

[17]  Steven Skiena,et al.  POLYGLOT-NER: Massive Multilingual Named Entity Recognition , 2014, SDM.

[18]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[19]  Peter Elias,et al.  Efficient Storage and Retrieval by Content and Address of Static Files , 1974, JACM.

[20]  Zaiqing Nie,et al.  Joint Entity Recognition and Disambiguation , 2015, EMNLP.

[21]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[22]  Partha Pratim Talukdar,et al.  Automatic Gloss Finding for a Knowledge Base using Ontological Constraints , 2015, WSDM.

[23]  Robert J. Gaizauskas,et al.  Graph Ranking for Collective Named Entity Disambiguation , 2014, ACL.

[24]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[25]  Lan Nie,et al.  Resolving Surface Forms to Wikipedia Topics , 2010, COLING.

[26]  Wanxiang Che,et al.  Named Entity Recognition with Bilingual Constraints , 2013, HLT-NAACL.

[27]  Michael Strube,et al.  HITS' Monolingual and Cross-lingual Entity Linking System at TAC 2012: A Joint Approach , 2012, TAC.

[28]  Koby Crammer,et al.  New Regularized Algorithms for Transductive Learning , 2009, ECML/PKDD.

[29]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.

[30]  Houfeng Wang,et al.  Learning Entity Representation for Entity Disambiguation , 2013, ACL.

[31]  Ben Hachey,et al.  Entity Disambiguation with Web Links , 2015, TACL.

[32]  Jun Suzuki,et al.  Semi-Supervised Sequential Labeling and Segmentation Using Giga-Word Scale Unlabeled Data , 2008, ACL.

[33]  Heng Ji,et al.  Wikification and Beyond: The Challenges of Entity and Concept Grounding , 2014, ACL.

[34]  Takahiro Hara,et al.  Entity Disambiguation based on a Probabilistic Taxonomy , 2011 .

[35]  Krisztian Balog,et al.  Entity linking and retrieval , 2013, SIGIR.

[36]  Zhaochen Guo,et al.  Robust Entity Linking via Random Walks , 2014, CIKM.

[37]  Wei Shen,et al.  Linking named entities in Tweets with knowledge base via user interest modeling , 2013, KDD.

[38]  Felix Naumann,et al.  BEL: Bagging for Entity Linking , 2014, COLING.

[39]  Ben Hachey,et al.  Overview of TAC-KBP2014 Entity Discovery and Linking Tasks , 2015 .

[40]  Jun Zhao,et al.  Collective entity linking in web text: a graph-based method , 2011, SIGIR.

[41]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[42]  Dan Klein,et al.  A Joint Model for Entity Analysis: Coreference, Typing, and Linking , 2014, TACL.

[43]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[44]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[45]  Salvatore Orlando,et al.  Learning relatedness measures for entity linking , 2013, CIKM.

[46]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[47]  Robert J. Gaizauskas,et al.  Collective Named Entity Disambiguation using Graph Ranking and Clique Partitioning Approaches , 2014, COLING.

[48]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[49]  Benjamin Heinzerling,et al.  HITS' Monolingual and Cross-lingual Entity Linking System at TAC 2013 , 2013, TAC.

[50]  Steve Austin,et al.  The forward-backward search algorithm , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[51]  Daniel S. Weld,et al.  Design Challenges for Entity Linking , 2015, TACL.

[52]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[53]  Hao Wu,et al.  Hierarchical Neural Language Models for Joint Representation of Streaming Documents and their Content , 2015, WWW.

[54]  James R. Curran,et al.  Graph-Based Named Entity Linking with Wikipedia , 2011, WISE.

[55]  Thomas Hofmann,et al.  Probabilistic Bag-Of-Hyperlinks Model for Entity Linking , 2015, WWW.

[56]  Avirup Sil,et al.  Re-ranking for joint named-entity recognition and linking , 2013, CIKM.

[57]  Krisztian Balog,et al.  Entity linking and retrieval for semantic search , 2014, WSDM.