Learning Context Using Segment-Level LSTM for Neural Sequence Labeling

This article introduces an approach that learns segment-level context for sequence labeling in natural language processing (NLP). Previous approaches limit their basic unit to a word for feature extraction because sequence labeling is a token-level task in which labels are annotated word-by-word. However, the text segment is an ultimate unit for labeling, and we are easily able to obtain segment information from annotated labels in a IOB/IOBES format. Most neural sequence labeling models expand their learning capacity by employing additional layers, such as a character-level layer, or jointly training NLP tasks with common knowledge. The architecture of our model is based on the charLSTM-BiLSTM-CRF model, and we extend the model with an additional segment-level layer called segLSTM. We therefore suggest a sequence labeling algorithm called charLSTM-BiLSTM-CRF-segLSTM$^{sLM}$ which employs an additional segment-level long short-term memory (LSTM) that trains features by learning adjacent context in a segment. We demonstrate the performance of our model on four sequence labeling datasets, namely, Peen Tree Bank, CoNLL 2000, CoNLL 2003, and OntoNotes 5.0. Experimental results show that our model performs better than state-of-the-art variants of BiLSTM-CRF. In particular, the proposed model enhances the performance of tasks for finding appropriate labels of multiple token segments.

[1]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[2]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[3]  Bo Zhang,et al.  Webpage understanding: an integrated approach , 2007, KDD '07.

[4]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[5]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[6]  Christopher D. Manning Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? , 2011, CICLing.

[7]  Anders Søgaard,et al.  Semi-supervised condensed nearest neighbor for part-of-speech tagging , 2011, ACL.

[8]  Dekang Lin,et al.  Phrase Clustering for Discriminative Learning , 2009, ACL.

[9]  William W. Cohen,et al.  Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods , 2004, KDD.

[10]  Thorsten Brants,et al.  One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[11]  Yoshimasa Tsuruoka,et al.  A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks , 2016, EMNLP.

[12]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[13]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[14]  Zhi-Hong Deng,et al.  Unsupervised Neural Word Segmentation for Chinese via Segmental Language Modeling , 2018, EMNLP.

[15]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[16]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[17]  Hwee Tou Ng,et al.  Towards Robust Linguistic Analysis using OntoNotes , 2013, CoNLL.

[18]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[19]  Christoph Goller,et al.  Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[20]  Biing-Hwang Juang,et al.  HMM clustering for connected word recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[21]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[22]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[23]  Galen Andrew,et al.  A Hybrid Markov/Semi-Markov Conditional Random Field for Sequence Segmentation , 2006, EMNLP.

[24]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[25]  Claire Cardie,et al.  Extracting Opinion Expressions with semi-Markov Conditional Random Fields , 2012, EMNLP.

[26]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[27]  Yung-Chun Chang,et al.  Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization , 2015, Journal of Cheminformatics.

[28]  Zhen-Hua Ling,et al.  Hybrid semi-Markov CRF for Neural Sequence Labeling , 2018, ACL.

[29]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[30]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[31]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[32]  Yuchen Zhang,et al.  CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes , 2012, EMNLP-CoNLL Shared Task.

[33]  Yijia Liu,et al.  Exploring Segment Representations for Neural Segmentation Models , 2016, IJCAI.

[34]  Chris Dyer,et al.  Learning to Discover, Ground and Use Words with Segmental Neural Language Models , 2018, ACL.

[35]  Jun'ichi Tsujii,et al.  Improving the Scalability of Semi-Markov Conditional Random Fields for Named Entity Recognition , 2006, ACL.

[36]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[37]  Andrew McCallum,et al.  Lexicon Infused Phrase Embeddings for Named Entity Resolution , 2014, CoNLL.

[38]  Hiroyuki Shindo,et al.  Segment-Level Neural Conditional Random Fields for Named Entity Recognition , 2017, IJCNLP.

[39]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .

[40]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[41]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[42]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[43]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[44]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[45]  Steve Renals,et al.  Named entity tagged language models , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[46]  William W. Cohen,et al.  Semi-Markov Conditional Random Fields for Information Extraction , 2004, NIPS.

[47]  Chandra Bhagavatula,et al.  Semi-supervised sequence tagging with bidirectional language models , 2017, ACL.

[48]  Bo Zhang,et al.  Segment-Level Sequence Modeling using Gated Recursive Semi-Markov Conditional Random Fields , 2016, ACL.

[49]  Xu Sun,et al.  Structure Regularization for Structured Prediction , 2014, NIPS.

[50]  Heike Adel,et al.  Neural Semi-Markov Conditional Random Fields for Robust Character-Based Part-of-Speech Tagging , 2019, NAACL-HLT.

[51]  Zaiqing Nie,et al.  Joint Entity Recognition and Disambiguation , 2015, EMNLP.

[52]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[53]  Victor Guimar Boosting Named Entity Recognition with Neural Character Embeddings , 2015 .

[54]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[55]  Xiang Ren,et al.  Empower Sequence Labeling with Task-Aware Neural Language Model , 2017, AAAI.

[56]  Noah A. Smith,et al.  Segmental Recurrent Neural Networks , 2015, ICLR.

[57]  Marek Rei,et al.  Semi-supervised Multitask Learning for Sequence Labeling , 2017, ACL.

[58]  Ruslan Salakhutdinov,et al.  Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks , 2016, ICLR.

[59]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.