Deep Learning for Efficient Discriminative Parsing

We propose a new fast purely discriminative algorithm for natural language parsing, based on a “deep” recurrent convolutional graph transformer network (GTN). Assuming a decomposition of a parse tree into a stack of “levels”, the network predicts a level of the tree taking into account predictions of previous levels. Using only few basic text features which leverage word representations from Collobert and Weston (2008), we show similar performance (in F1 score) to existing pure discriminative parsers and existing “benchmark” parsers (like Collins parser, probabilistic context-free grammars based), with a huge speed advantage.

[1]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[2]  Yann LeCun,et al.  Une procedure d'apprentissage pour reseau a seuil asymmetrique (A learning scheme for asymmetric threshold networks) , 1985 .

[3]  D. Rumelhart Learning internal representations by back-propagating errors , 1986 .

[4]  Geoffrey E. Hinton,et al.  Learning sets of filters using back-propagation , 1987 .

[5]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[6]  L. Bottou Stochastic Gradient Learning in Neural Networks , 1991 .

[7]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[8]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[9]  Michael Collins,et al.  A New Statistical Parser Based on Bigram Lexical Dependencies , 1996, ACL.

[10]  Steven P. Abney Partial parsing via finite-state cascades , 1996, Natural Language Engineering.

[11]  Yoshua Bengio,et al.  Global training of document processing systems using graph transformer networks , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[13]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[14]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[15]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[16]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[17]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[18]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[19]  James Henderson,et al.  Discriminative Training of a Neural Network Statistical Parser , 2004, ACL.

[20]  Ben Taskar,et al.  Max-Margin Parsing , 2004, EMNLP.

[21]  Adwait Ratnaparkhi,et al.  Learning to Parse Natural Language with Maximum Entropy Models , 1999, Machine Learning.

[22]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[23]  Phil Blunsom,et al.  Semantic Role Labelling with Tree Conditional Random Fields , 2005, CoNLL.

[24]  I. Dan Melamed,et al.  Advances in Discriminative Parsing , 2006, ACL.

[25]  Xavier Carreras,et al.  TAG, Dynamic Programming, and the Perceptron for Efficient, Feature-Rich Parsing , 2008, CoNLL.

[26]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[27]  Dan Klein,et al.  Sparse Multi-Scale Grammars for Discriminative Latent Variable Parsing , 2008, EMNLP.

[28]  Christopher D. Manning,et al.  Efficient, Feature-based, Conditional Random Field Parsing , 2008, ACL.

[29]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..