Does Higher Order LSTM Have Better Accuracy in Chunking and Named Entity Recognition?

Current researches usually employ single order setting by default when dealing with sequence labeling tasks. In our work, "order" means the number of tags that a prediction involves at every time step. High order models tend to capture more dependency information among tags. We first propose a simple method that low order models can be easily extended to high order models. To our surprise, the high order models which are supposed to capture more dependency information behave worse when increasing the order. We suppose that forcing neural networks to learn complex structure may lead to overfitting. To deal with the problem, we propose a method which combines low order and high order information together to decode the tag sequence. The proposed method, multi-order decoding (MOD), keeps the scalability to high order models with a pruning technique. MOD achieves higher accuracies than existing methods of single order setting. It results in a 21% error reduction compared to baselines in chunking and an error reduction over 23% in two NER tasks. The code is available at this https URL

[1]  Christopher D. Manning,et al.  Effect of Non-linear Deep Architecture in Sequence Labeling , 2013, IJCNLP.

[2]  Andrew McCallum,et al.  Lexicon Infused Phrase Embeddings for Named Entity Resolution , 2014, CoNLL.

[3]  Joel Nothman,et al.  Learning multilingual named entity recognition from Wikipedia , 2013, Artif. Intell..

[4]  Deniz Yuret,et al.  CharNER: Character-Level Named Entity Recognition , 2016, COLING.

[5]  Hong Shen,et al.  Voting Between Multiple Data Representations for Text Chunking , 2005, Canadian AI.

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Xu Sun,et al.  Structure Regularization for Structured Prediction: Theories and Experiments , 2014, ArXiv.

[8]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[9]  Xu Sun,et al.  Structure Regularization for Structured Prediction , 2014, NIPS.

[10]  Xu Sun,et al.  Modeling Latent-Dynamic in Shallow Parsing: A Latent Conditional Model with Improved Inference , 2008, COLING.

[11]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[12]  Xavier Carreras,et al.  Named Entity Extraction using AdaBoost , 2002, CoNLL.

[13]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[14]  James Hammerton,et al.  Named Entity Recognition with Long Short-Term Memory , 2003, CoNLL.

[15]  Xu Sun,et al.  Feature-Frequency–Adaptive On-line Training for Fast and Accurate Natural Language Processing , 2014, CL.

[16]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[17]  Oriol Vinyals,et al.  Multilingual Language Processing From Bytes , 2015, NAACL.

[18]  Tong Zhang,et al.  Named Entity Recognition through Classifier Combination , 2003, CoNLL.

[19]  Hui Jiang,et al.  Higher Order Recurrent Neural Networks , 2016, ArXiv.

[20]  Xu Sun,et al.  A New Recurrent Neural CRF for Learning Non-linear Edge Features , 2016, ArXiv.

[21]  Xu Sun,et al.  Latent Variable Perceptron Algorithm for Structured Classification , 2009, IJCAI.

[22]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[23]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[24]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[25]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[26]  Hwee Tou Ng,et al.  Named Entity Recognition: A Maximum Entropy Approach Using Global Information , 2002, COLING.

[27]  Yuji Matsumoto,et al.  Chunking with Support Vector Machines , 2001, NAACL.

[28]  Xu Sun,et al.  A Generic Online Parallel Learning Framework for Large Margin Models , 2017, ArXiv.

[29]  Yuji Matsumoto,et al.  Use of Support Vector Learning for Chunk Identification , 2000, CoNLL/LLL.

[30]  Bowen Zhou,et al.  Neural Models for Sequence Chunking , 2017, AAAI.

[31]  Koby Crammer,et al.  Flexible Text Segmentation with Structured Multilabel Classification , 2005, HLT.

[32]  Andrew McCallum,et al.  Fast and Accurate Entity Recognition with Iterated Dilated Convolutions , 2017, EMNLP.

[33]  Ruslan Salakhutdinov,et al.  Multi-Task Cross-Lingual Sequence Tagging from Scratch , 2016, ArXiv.

[34]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[35]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[36]  Sabine Buchholz,et al.  Introduction to the CoNLL-2000 Shared Task Chunking , 2000, CoNLL/LLL.

[37]  Hinrich Schütze,et al.  Efficient Higher-Order CRFs for Morphological Tagging , 2013, EMNLP.