Sequence Labeling With Deep Gated Dual Path CNN

Sequence labeling, such as part-of-speech (POS) tagging, named entity recognition (NER), text chunking, is a classic task in natural language processing. Most existing neural networks models for sequence labeling are based on recurrent neural networks. Recently, convolutional neural networks have been proposed to replace the recurrent components for sequence labeling. However, they are usually shallow compared to deep convolutional networks that achieve start-of-the-art performance in other fields. Due to the vanishing gradient problem, these models usually can not work well when simply increasing the number of layers. In this paper, we propose using deep CNN architecture in sequence labeling, which can capture a large context through stacked convolutions. To reduce the vanishing gradient problem, the proposed method incorporates gated linear units, residual connections, and dense connections. Experimental results on three sequence labeling tasks show that the proposed model can achieve competitive performance to the RNN-based state-of-the-art method while maintaining $2.41\times$ faster speed, even with up to 10 convolutional layers.

[1]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[2]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[3]  Wenpeng Yin,et al.  Comparative Study of CNN and RNN for Natural Language Processing , 2017, ArXiv.

[4]  Bowen Zhou,et al.  Neural Models for Sequence Chunking , 2017, AAAI.

[5]  Xu Sun,et al.  Structure Regularization for Structured Prediction , 2014, NIPS.

[6]  Yann Dauphin,et al.  Language Modeling with Gated Convolutional Networks , 2016, ICML.

[7]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[8]  Yue Zhang,et al.  A Neural Probabilistic Structured-Prediction Method for Transition-Based Natural Language Processing , 2017, J. Artif. Intell. Res..

[9]  Shuicheng Yan,et al.  Dual Path Networks , 2017, NIPS.

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[12]  Yue Zhang,et al.  Design Challenges and Misconceptions in Neural Sequence Labeling , 2018, COLING.

[13]  Xiang Ren,et al.  Empower Sequence Labeling with Task-Aware Neural Language Model , 2017, AAAI.

[14]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[15]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[16]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[17]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[18]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[19]  Eduard H. Hovy,et al.  Efficient Inner-to-outer Greedy Algorithm for Higher-order Labeled Dependency Parsing , 2015, EMNLP.

[20]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[21]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[22]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[23]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[24]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[26]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[27]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[28]  Andrew McCallum,et al.  Lexicon Infused Phrase Embeddings for Named Entity Resolution , 2014, CoNLL.

[29]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[30]  Chandra Bhagavatula,et al.  Semi-supervised sequence tagging with bidirectional language models , 2017, ACL.

[31]  Yidong Chen,et al.  Deep Semantic Role Labeling with Self-Attention , 2017, AAAI.

[32]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[33]  Tong Zhang,et al.  Deep Pyramid Convolutional Neural Networks for Text Categorization , 2017, ACL.

[34]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[35]  Heng Ji,et al.  Heterogeneous Supervision for Relation Extraction: A Representation Learning Approach , 2017, EMNLP.

[36]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[37]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[38]  Marek Rei,et al.  Semi-supervised Multitask Learning for Sequence Labeling , 2017, ACL.

[39]  Yann Dauphin,et al.  A Convolutional Encoder Model for Neural Machine Translation , 2016, ACL.

[40]  Ruslan Salakhutdinov,et al.  Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks , 2016, ICLR.

[41]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[42]  John Blitzer,et al.  Fast Reading Comprehension with ConvNets , 2017, ArXiv.

[43]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[44]  Andrew McCallum,et al.  Fast and Accurate Entity Recognition with Iterated Dilated Convolutions , 2017, EMNLP.

[45]  Johan Bos,et al.  Semantic Tagging with Deep Residual Networks , 2016, COLING.

[46]  Cícero Nogueira dos Santos,et al.  Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[47]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[48]  Alex Graves,et al.  Neural Machine Translation in Linear Time , 2016, ArXiv.