TENER: Adapting Transformer Encoder for Named Entity Recognition

The Bidirectional long short-term memory networks (BiLSTM) have been widely used as an encoder in models solving the named entity recognition (NER) task. Recently, the Transformer is broadly adopted in various Natural Language Processing (NLP) tasks owing to its parallelism and advantageous performance. Nevertheless, the performance of the Transformer in NER is not as good as it is in other NLP tasks. In this paper, we propose TENER, a NER architecture adopting adapted Transformer Encoder to model the character-level features and word-level features. By incorporating the direction and relative distance aware attention and the un-scaled attention, we prove the Transformer-like encoder is just as effective for NER as other NLP tasks.

[1]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[2]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[3]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[4]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[5]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[6]  Jean-David Ruvini,et al.  Learning Better Internal Structure of Words for Sequence Labeling , 2018, EMNLP.

[7]  Ming Zhou,et al.  Neural Question Generation from Text: A Preliminary Study , 2017, NLPCC.

[8]  Xiang Ren,et al.  Empower Sequence Labeling with Task-Aware Neural Language Model , 2017, AAAI.

[9]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[10]  Leslie N. Smith,et al.  Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[11]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12]  Zhen-Hua Ling,et al.  Hybrid semi-Markov CRF for Neural Sequence Labeling , 2018, ACL.

[13]  Fan Yang,et al.  Five-Stroke Based CNN-BiRNN-CRF Network for Chinese Named Entity Recognition , 2018, NLPCC.

[14]  Andrew McCallum,et al.  Fast and Accurate Entity Recognition with Iterated Dilated Convolutions , 2017, EMNLP.

[15]  Pavlina Fragkou,et al.  Applying named entity recognition and co-reference resolution for segmenting English texts , 2017, Progress in Artificial Intelligence.

[16]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[17]  Guoxin Wang,et al.  CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition , 2019, NAACL.

[18]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[19]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[20]  Roland Vollgraf,et al.  Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[21]  Makoto Miwa,et al.  End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures , 2016, ACL.

[22]  Wanxiang Che,et al.  Named Entity Recognition with Bilingual Constraints , 2013, HLT-NAACL.

[23]  Xuanjing Huang,et al.  Contextualized Non-local Neural Networks for Sequence Learning , 2018, AAAI.

[24]  Yue Zhang,et al.  Sentence-State LSTM for Text Representation , 2018, ACL.

[25]  Chandra Bhagavatula,et al.  Semi-supervised sequence tagging with bidirectional language models , 2017, ACL.

[26]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[27]  Tao Gui,et al.  A Lexicon-Based Graph Neural Network for Chinese NER , 2019, EMNLP.

[28]  Andrew M. Dai,et al.  Music Transformer: Generating Music with Long-Term Structure , 2018, ICLR.

[29]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[30]  Nanyun Peng,et al.  Named Entity Recognition for Chinese Social Media with Jointly Trained Embeddings , 2015, EMNLP.

[31]  Georg Groh,et al.  Sequence Labeling: A Practical Approach , 2018, ArXiv.

[32]  Yue Zhang,et al.  Chinese NER Using Lattice LSTM , 2018, ACL.

[33]  Yiming Yang,et al.  Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[34]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[35]  Gina-Anne Levow,et al.  The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition , 2006, SIGHAN@COLING/ACL.

[36]  Masanori Hattori,et al.  Character-Based LSTM-CRF with Radical-Level Features for Chinese Named Entity Recognition , 2016, NLPCC/ICCPOL.

[37]  Hui Chen,et al.  GRN: Gated Relation Network to Enhance Convolutional Neural Network for Named Entity Recognition , 2019, AAAI.

[38]  Philippe Langlais,et al.  Robust Lexical Features for Improved Neural Network Named-Entity Recognition , 2018, COLING.

[39]  Xuanjing Huang,et al.  CNN-Based Chinese NER with Lexicon Rethinking , 2019, IJCAI.

[40]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[41]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.