Named Entity Disambiguation for Noisy Text

We address the task of Named Entity Disambiguation (NED) for noisy text. We present WikilinksNED, a large-scale NED dataset of text fragments from the web, which is significantly noisier and more challenging than existing news-based datasets. To capture the limited and noisy local context surrounding each mention, we design a neural model and train it with a novel method for sampling informative negative examples. We also describe a new way of initializing word and entity embeddings that significantly improves performance. Our model significantly outperforms existing state-of-the-art methods on WikilinksNED while achieving comparable performance on a smaller newswire dataset.

[1]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[2]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[3]  Rui Yan,et al.  How Transferable are Neural Networks in NLP Applications? , 2016, EMNLP.

[4]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[5]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[6]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.

[7]  Fernando Pereira,et al.  Wikilinks: A Large-scale Cross-Document Coreference Corpus Labeled via Links to Wikipedia , 2012 .

[8]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[9]  Nevena Lazic,et al.  Plato: A Selective Context Model for Entity Resolution , 2015, TACL.

[10]  Doug Downey,et al.  Local and Global Algorithms for Disambiguation to Wikipedia , 2011, ACL.

[11]  Mirella Lapata,et al.  Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , 2015 .

[12]  Hiroyuki Shindo,et al.  Joint Learning of the Embedding of Words and Entities for Named Entity Disambiguation , 2016, CoNLL.

[13]  Eric P. Xing,et al.  Entity Hierarchy Embedding , 2015, ACL.

[14]  Omer Levy,et al.  Do Supervised Distributional Methods Really Learn Lexical Inference Relations? , 2015, NAACL.

[15]  Zhaochen Guo,et al.  Entity linking with a unified semantic representation , 2014, WWW '14 Companion.

[16]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[17]  Heng Ji,et al.  Collective Tweet Wikification based on Semi-supervised Graph Regularization , 2014, ACL.

[18]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[19]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[20]  Houfeng Wang,et al.  Learning Entity Representation for Entity Disambiguation , 2013, ACL.

[21]  Jonathan Berant,et al.  Semantic Parsing via Paraphrasing , 2014, ACL.

[22]  Fernando Pereira,et al.  Collective Entity Resolution with Multi-Focal Attention , 2016, ACL.

[23]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[24]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[25]  Dan Roth,et al.  Relational Inference for Wikification , 2013, EMNLP.

[26]  M. de Rijke,et al.  Adding semantics to microblog posts , 2012, WSDM '12.

[27]  Ben Hachey,et al.  Entity Disambiguation with Web Links , 2015, TACL.

[28]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[29]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[30]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[31]  Dirk Hovy,et al.  Crowdsourcing and annotating NER for Twitter #drift , 2014, LREC.

[32]  Dan Klein,et al.  Capturing Semantic Similarity for Entity Linking with Convolutional Neural Networks , 2016, NAACL.

[33]  Xiaolong Wang,et al.  Modeling Mention, Context and Entity with Neural Networks for Entity Disambiguation , 2015, IJCAI.

[34]  Yifan He,et al.  Personalized Page Rank for Named Entity Disambiguation , 2015, NAACL.

[35]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.