Parsing with Compositional Vector Grammars

Natural language parsing has typically been done with small sets of discrete categories such as NP and VP, but this representation does not capture the full syntactic nor semantic richness of linguistic phrases, and attempts to improve on this by lexicalizing phrases or splitting categories only partly address the problem at the cost of huge feature spaces and sparseness. Instead, we introduce a Compositional Vector Grammar (CVG), which combines PCFGs with a syntactically untied recursive neural network that learns syntactico-semantic, compositional vector representations. The CVG improves the PCFG of the Stanford Parser by 3.8% to obtain an F1 score of 90.4%. It is fast to train and implemented approximately as an efficient reranker it is about 20% faster than the current Stanford factored parser. The CVG learns a soft notion of head words and improves performance on the types of ambiguities that require semantic information such as PP attachments.

[1]  J. Elman Distributed Representations, Simple Recurrent Networks, And Grammatical Structure , 1991 .

[2]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[3]  Christoph Goller,et al.  Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[4]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[5]  Joshua Goodman,et al.  Parsing Inside-Out , 1998, ArXiv.

[6]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[7]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[8]  Daniel Gildea,et al.  The Necessity of Parsing for Predicate Argument Recognition , 2002, ACL.

[9]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[10]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[11]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[12]  James Henderson Neural Network Probability Estimation for Broad Coverage Parsing , 2003, EACL.

[13]  James Henderson,et al.  Discriminative Training of a Neural Network Statistical Parser , 2004, ACL.

[14]  Ben Taskar,et al.  Max-Margin Parsing , 2004, EMNLP.

[15]  Giovanni Soda,et al.  Towards Incremental Parsing of Natural Language Using Recursive Neural Networks , 2003, Applied Intelligence.

[16]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[17]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[18]  Paolo Frasconi,et al.  Wide coverage natural language processing using kernel methods and neural networks for structured data , 2005, Pattern Recognit. Lett..

[19]  David Chiang,et al.  Better k-best Parsing , 2005, IWPT.

[20]  Ivan Titov,et al.  Porting Statistical Parsers with Data-Defined Kernels , 2006, CoNLL.

[21]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[22]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[23]  Ivan Titov,et al.  Constituent Parsing with Incremental Sigmoid Belief Networks , 2007, ACL.

[24]  Nathan Ratliff,et al.  Online) Subgradient Methods for Structured Prediction , 2007 .

[25]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[26]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[27]  Chris Callison-Burch,et al.  Syntactic Constraints on Paraphrases Extracted from Parallel Corpora , 2008, EMNLP.

[28]  Christopher D. Manning,et al.  Efficient, Feature-based, Conditional Random Field Parsing , 2008, ACL.

[29]  Quoc V. Le,et al.  Tiled convolutional neural networks , 2010, NIPS.

[30]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[31]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[32]  Christopher D. Manning,et al.  Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks , 2010 .

[33]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[34]  Andrew Y. Ng,et al.  Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.

[35]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[36]  Dimitri Kartsaklis,et al.  A Unified Sentence Space for Categorical Distributional-Compositional Semantics: Theory and Experiments , 2012, COLING.

[37]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[38]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[39]  Dan Klein,et al.  Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output , 2012, EMNLP.

[40]  Dan Klein,et al.  Training Factored PCFGs with Expectation Propagation , 2012, EMNLP-CoNLL.

[41]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.