Document-Level Named Entity Recognition by Incorporating Global and Neighbor Features

State-of-the-art named entity recognition models mostly process sentences within a document separately. Sentence-level named entity recognition is easy to cause tagging inconsistency problems for long text documents. In this paper, we first propose to use the neural network to encode global consistency and neighbor relevance among occurrences of a particular token within a document. We first encode sentences within a document independently by a sentence-level BiLSTM layer, then we design a document-level module to encode the relation between occurrences of a particular token. In our document-level module, we use CNN to encode global consistency features and apply BiLSTM to model neighbor relevance features. We further apply a gate to effectively fuse these two non-local features and use a CRF layer to decode labels. We evaluate our model on the CoNLL-2003 dataset. Experimental results show that our model outperforms existing methods.

[1]  Sam Coope,et al.  Named Entity Recognition With Parallel Recurrent Neural Networks , 2018, ACL.

[2]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[3]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[4]  Kentaro Torisawa,et al.  A New Perceptron Algorithm for Sequence Labeling with Non-Local Features , 2007, EMNLP.

[5]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[6]  Heng Ji,et al.  Global Attention for Name Tagging , 2020, CoNLL.

[7]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[8]  Hongfei Lin,et al.  An attention‐based BiLSTM‐CRF approach to document‐level chemical named entity recognition , 2018, Bioinform..

[9]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[10]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[11]  Hwee Tou Ng,et al.  Named Entity Recognition with a Maximum Entropy Approach , 2003, CoNLL.

[12]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[13]  Christopher D. Manning,et al.  An Effective Two-Stage Model for Exploiting Non-Local Dependencies in Named Entity Recognition , 2006, ACL.

[14]  Kazem Taghva,et al.  Address extraction using hidden Markov models , 2005, IS&T/SPIE Electronic Imaging.

[15]  Andrew McCallum,et al.  Fast and Accurate Entity Recognition with Iterated Dilated Convolutions , 2017, EMNLP.

[16]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[17]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[18]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[19]  Tim Leek,et al.  Information Extraction Using Hidden Markov Models , 1997 .

[20]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[21]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[22]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[23]  Roland Vollgraf,et al.  Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[24]  Moditha Hewasinghage,et al.  A Frequent Named Entities-Based Approach for Interpreting Reputation in Twitter , 2018, Data Science and Engineering.

[25]  Ruifang Liu,et al.  Neural Entity Reasoner for Global Consistency in NER , 2018, ArXiv.