论文信息 - Does Higher Order LSTM Have Better Accuracy in Chunking and Named Entity Recognition?

Does Higher Order LSTM Have Better Accuracy in Chunking and Named Entity Recognition?

Current researches usually employ single order setting by default when dealing with sequence labeling tasks. In our work, "order" means the number of tags that a prediction involves at every time step. High order models tend to capture more dependency information among tags. We first propose a simple method that low order models can be easily extended to high order models. To our surprise, the high order models which are supposed to capture more dependency information behave worse when increasing the order. We suppose that forcing neural networks to learn complex structure may lead to overfitting. To deal with the problem, we propose a method which combines low order and high order information together to decode the tag sequence. The proposed method, multi-order decoding (MOD), keeps the scalability to high order models with a pruning technique. MOD achieves higher accuracies than existing methods of single order setting. It results in a 21% error reduction compared to baselines in chunking and an error reduction over 23% in two NER tasks. The code is available at this https URL

Yang Yang | Yi Zhang | Xu Sun

[1] Christopher D. Manning,et al. Effect of Non-linear Deep Architecture in Sequence Labeling , 2013, IJCNLP.

[2] Andrew McCallum,et al. Lexicon Infused Phrase Embeddings for Named Entity Resolution , 2014, CoNLL.

[3] Joel Nothman,et al. Learning multilingual named entity recognition from Wikipedia , 2013, Artif. Intell..

[4] Deniz Yuret,et al. CharNER: Character-Level Named Entity Recognition , 2016, COLING.

[5] Hong Shen,et al. Voting Between Multiple Data Representations for Text Chunking , 2005, Canadian AI.

[6] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7] Xu Sun,et al. Structure Regularization for Structured Prediction: Theories and Experiments , 2014, ArXiv.

[8] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[9] Xu Sun,et al. Structure Regularization for Structured Prediction , 2014, NIPS.

[10] Xu Sun,et al. Modeling Latent-Dynamic in Shallow Parsing: A Latent Conditional Model with Improved Inference , 2008, COLING.

[11] Erik F. Tjong Kim Sang,et al. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.