Transition-based Neural Constituent Parsing

Constituent parsing is typically modeled by a chart-based algorithm under probabilistic context-free grammars or by a transition-based algorithm with rich features. Previous models rely heavily on richer syntactic information through lexicalizing rules, splitting categories, or memorizing long histories. However enriched models incur numerous parameters and sparsity issues, and are insufficient for capturing various syntactic phenomena. We propose a neural network structure that explicitly models the unbounded history of actions performed on the stack and queue employed in transition-based parsing, in addition to the representations of partially parsed tree structure. Our transition-based neural constituent parsing achieves performance comparable to the state-of-the-art parsers, demonstrating F1 score of 90.68% for English and 84.33% for Chinese, without reranking, feature templates or additional data to train model parameters.

[1]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[2]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[3]  Takashi Chikayama,et al.  Simple Customization of Recursive Neural Networks for Semantic Relation Classification , 2013, EMNLP.

[4]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[5]  M. A. R T A P A L,et al.  The Penn Chinese TreeBank: Phrase structure annotation of a large corpus , 2005, Natural Language Engineering.

[6]  Taro Watanabe,et al.  Recurrent Neural Networks for Word Alignment Model , 2014, ACL.

[7]  Dan Klein,et al.  Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output , 2012, EMNLP.

[8]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[9]  Yoram Singer,et al.  Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[10]  Yee Whye Teh,et al.  A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[11]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[12]  James Henderson,et al.  Discriminative Training of a Neural Network Statistical Parser , 2004, ACL.

[13]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[14]  Georg Heigold,et al.  An empirical study of learning rates in deep neural networks for speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Paolo Frasconi,et al.  Wide coverage natural language processing using kernel methods and neural networks for structured data , 2005, Pattern Recognit. Lett..

[16]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[17]  Phong Le,et al.  The Inside-Outside Recursive Neural Network model for Dependency Parsing , 2014, EMNLP.

[18]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[19]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[20]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[21]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[22]  Risto Miikkulainen,et al.  SARDSRN: A Neural Network Shift-Reduce Parser , 1999, IJCAI.

[23]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[24]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[25]  Christoph Goller,et al.  Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[26]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[27]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[28]  Dan Klein,et al.  Less Grammar, More Features , 2014, ACL.

[29]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[30]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[31]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[32]  Alon Lavie,et al.  A Classifier-Based Parser with Linear Run-Time Complexity , 2005, IWPT.

[33]  Ralph Grishman,et al.  A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[34]  James Henderson Inducing History Representations for Broad Coverage Statistical Parsing , 2003, HLT-NAACL.

[35]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[36]  Giovanni Soda,et al.  Towards Incremental Parsing of Natural Language Using Recursive Neural Networks , 2003, Applied Intelligence.

[37]  George Berg,et al.  A Connectionist Parser with Recursive Sentence Structure and Lexical Disambiguation , 1992, AAAI.

[38]  Richard M. Schwartz,et al.  Fast and Robust Neural Network Joint Models for Statistical Machine Translation , 2014, ACL.

[39]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[40]  Stephen Clark,et al.  Transition-Based Parsing of the Chinese Treebank using a Global Discriminative Model , 2009, IWPT.

[41]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[42]  Nianwen Xue,et al.  Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features , 2014, ACL.

[43]  Pontus Stenetorp,et al.  Transition-based Dependency Parsing Using Recursive Neural Networks , 2013 .

[44]  Yang Guo,et al.  Structured Perceptron with Inexact Search , 2012, NAACL.

[45]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.