Joint POS Tagging and Dependence Parsing With Transition-Based Neural Networks

While part-of-speech (POS) tagging and dependency parsing are observed to be closely related, existing work on joint modeling with manually crafted feature templates suffers from the feature sparsity and incompleteness problems. In this paper, we propose an approach to joint POS tagging and dependency parsing using transition-based neural networks. Three neural network based classifiers are designed to resolve shift/reduce, tagging, and labeling conflicts. Experiments show that our approach significantly outperforms previous methods for joint POS tagging and dependency parsing across a variety of natural languages.

[1]  Joakim Nivre,et al.  A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing , 2012, EMNLP.

[2]  Haizhou Li,et al.  Joint Models for Chinese POS Tagging and Dependency Parsing , 2011, EMNLP.

[3]  Jianfeng Gao,et al.  Bi-directional Attention with Agreement for Dependency Parsing , 2016, EMNLP.

[4]  Noah A. Smith,et al.  Improved Transition-based Parsing by Modeling Characters instead of Words with LSTMs , 2015, EMNLP.

[5]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[6]  Wanxiang Che,et al.  A Separately Passive-Aggressive Training Algorithm for Joint POS Tagging and Dependency Parsing , 2012, COLING.

[7]  Noah A. Smith,et al.  Recurrent Neural Network Grammars , 2016, NAACL.

[8]  Noah A. Smith,et al.  What Do Recurrent Neural Network Grammars Learn About Syntax? , 2016, EACL.

[9]  Yuan Zhang,et al.  Stack-propagation: Improved Representation Learning for Syntax , 2016, ACL.

[10]  Slav Petrov,et al.  Improved Transition-Based Parsing and Tagging with Neural Networks , 2015, EMNLP.

[11]  Noah A. Smith,et al.  Many Languages, One Parser , 2016, TACL.

[12]  Wanxiang Che,et al.  Stacking Heterogeneous Joint Models of Chinese POS Tagging and Dependency Parsing , 2012, COLING.

[13]  Baobao Chang,et al.  Graph-based Dependency Parsing with Bidirectional LSTM , 2016, ACL.

[14]  Nianwen Xue,et al.  Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features , 2014, ACL.

[15]  Stephen Clark,et al.  A Tale of Two Parsers: Investigating and Combining Graph-based and Transition-based Dependency Parsing , 2008, EMNLP.

[16]  Eliyahu Kiperwasser,et al.  Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations , 2016, TACL.

[17]  Weiwei Sun,et al.  Capturing Long-distance Dependencies in Sequence Models: A Case Study of Chinese Part-of-speech Tagging , 2013, IJCNLP.

[18]  Jun'ichi Tsujii,et al.  Incremental Joint POS Tagging and Dependency Parsing in Chinese , 2011, IJCNLP.

[19]  Slav Petrov,et al.  Globally Normalized Transition-Based Neural Networks , 2016, ACL.

[20]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[21]  Joakim Nivre,et al.  Joint Morphological and Syntactic Analysis for Richly Inflected Languages , 2013, TACL.

[22]  Nianwen Xue,et al.  A Lattice-based Framework for Joint Chinese Word Segmentation, POS Tagging and Parsing , 2013, ACL.

[23]  Timothy Dozat,et al.  Deep Biaffine Attention for Neural Dependency Parsing , 2016, ICLR.

[24]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[25]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[26]  James Cross,et al.  Span-Based Constituency Parsing with a Structure-Label System and Provably Optimal Dynamic Oracles , 2016, EMNLP.

[27]  Yue Zhang,et al.  In-Order Transition-based Constituent Parsing , 2017, TACL.

[28]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[29]  Cícero Nogueira dos Santos,et al.  Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[30]  Joakim Nivre,et al.  Algorithms for Deterministic Incremental Dependency Parsing , 2008, CL.

[31]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[32]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..