Enhancing Neural Sequence Labeling with Position-Aware Self-Attention

Sequence labeling is a fundamental task in natural language processing and has been widely studied. Recently, RNN-based sequence labeling models have increasingly gained attentions. Despite superior performance achieved by learning the long short-term (i.e., successive) dependencies, the way of sequentially processing inputs might limit the ability to capture the non-continuous relations over tokens within a sentence. To tackle the problem, we focus on how to effectively model successive and discrete dependencies of each token for enhancing the sequence labeling performance. Specifically, we propose an innovative and well-designed attention-based model (called position-aware self-attention, i.e., PSA) within a neural network architecture, to explore the positional information of an input sequence for capturing the latent relations among tokens. Extensive experiments on three classical tasks in sequence labeling domain, i.e., part-of-speech (POS) tagging, named entity recognition (NER) and phrase chunking, demonstrate our proposed model outperforms the state-of-the-arts without any external knowledge, in terms of various metrics.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[3]  Yuan Zhang,et al.  Learning Tag Dependencies for Sequence Tagging , 2018, IJCAI.

[4]  Xu Sun,et al.  Feature-Frequency–Adaptive On-line Training for Fast and Accurate Natural Language Processing , 2014, CL.

[5]  Tao Shen,et al.  DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding , 2017, AAAI.

[6]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[7]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[8]  Chandra Bhagavatula,et al.  Semi-supervised sequence tagging with bidirectional language models , 2017, ACL.

[9]  Yuji Matsumoto,et al.  Use of Support Vector Learning for Chunk Identification , 2000, CoNLL/LLL.

[10]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[11]  Joakim Nivre,et al.  Deterministic Dependency Parsing of English Text , 2004, COLING.

[12]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[13]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[14]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[15]  Yidong Chen,et al.  Deep Semantic Role Labeling with Self-Attention , 2017, AAAI.

[16]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[17]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[18]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[19]  Rich Caruana,et al.  Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping , 2000, NIPS.

[20]  Marek Rei,et al.  Semi-supervised Multitask Learning for Sequence Labeling , 2017, ACL.

[21]  Jing Li,et al.  SegBot: A Generic Neural Text Segmentation Model with Pointer Network , 2018, IJCAI.

[22]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[23]  Giorgio Satta,et al.  Guided Learning for Bidirectional Sequence Classification , 2007, ACL.

[24]  Roland Vollgraf,et al.  Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[25]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[26]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[27]  Ruslan Salakhutdinov,et al.  Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks , 2016, ICLR.

[28]  Xiang Ren,et al.  Empower Sequence Labeling with Task-Aware Neural Language Model , 2017, AAAI.

[29]  Matthias Sperber,et al.  Self-Attentional Acoustic Models , 2018, INTERSPEECH.

[30]  Heng Ji,et al.  Heterogeneous Supervision for Relation Extraction: A Representation Learning Approach , 2017, EMNLP.

[31]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[32]  Mirella Lapata,et al.  Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.

[33]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[34]  Cícero Nogueira dos Santos,et al.  Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[35]  Bowen Zhou,et al.  Neural Models for Sequence Chunking , 2017, AAAI.

[36]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[37]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[38]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[39]  Xu Sun,et al.  A New Recurrent Neural CRF for Learning Non-linear Edge Features , 2016, ArXiv.

[40]  Andrew McCallum,et al.  Lexicon Infused Phrase Embeddings for Named Entity Resolution , 2014, CoNLL.

[41]  Anders Søgaard,et al.  Semi-supervised condensed nearest neighbor for part-of-speech tagging , 2011, ACL.

[42]  Xu Sun,et al.  Does Higher Order LSTM Have Better Accuracy for Segmenting and Labeling Sequence Data? , 2018, COLING.

[43]  Jungo Kasai,et al.  Robust Multilingual Part-of-Speech Tagging via Adversarial Training , 2017, NAACL.

[44]  Vincent Ng,et al.  Supervised Noun Phrase Coreference Research: The First Fifteen Years , 2010, ACL.

[45]  Xu Sun,et al.  Structure Regularization for Structured Prediction , 2014, NIPS.

[46]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[47]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..