Exploring Compositional Architectures and Word Vector Representations for Prepositional Phrase Attachment

Prepositional phrase (PP) attachment disambiguation is a known challenge in syntactic parsing. The lexical sparsity associated with PP attachments motivates research in word representations that can capture pertinent syntactic and semantic features of the word. One promising solution is to use word vectors induced from large amounts of raw text. However, state-of-the-art systems that employ such representations yield modest gains in PP attachment accuracy. In this paper, we show that word vector representations can yield significant PP attachment performance gains. This is achieved via a non-linear architecture that is discriminatively trained to maximize PP attachment accuracy. The architecture is initialized with word vectors trained from unlabeled data, and relearns those to maximize attachment accuracy. We obtain additional performance gains with alternative representations such as dependency-based word vectors. When tested on both English and Arabic datasets, our method outperforms both a strong SVM classifier and state-of-the-art parsers. For instance, we achieve 82.6% PP attachment accuracy on Arabic, while the Turbo and Charniak self-trained parsers obtain 76.7% and 80.8% respectively.

[1]  Pablo Gamallo,et al.  Acquiring Semantic Classes to Elaborate Attachment Heuristics , 2003, EPIA.

[2]  Eric Brill,et al.  A Rule-Based Approach to Prepositional Phrase Attachment Disambiguation , 1994, COLING.

[3]  Nizar Habash,et al.  Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop , 2005, ACL.

[4]  Paolo Frasconi,et al.  Wide coverage natural language processing using kernel methods and neural networks for structured data , 2005, Pattern Recognit. Lett..

[5]  Volkan Cirik,et al.  The AI-KU System at the SPMRL 2013 Shared Task : Unsupervised Features for Dependency Parsing , 2013, SPMRL@EMNLP.

[6]  Neville Ryant,et al.  A large-scale classification of English verbs , 2008, Lang. Resour. Evaluation.

[7]  Yonatan Belinkov,et al.  arTenTen: a new, vast corpus for Arabic , 2013 .

[8]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[9]  Giovanni Soda,et al.  Towards Incremental Parsing of Natural Language Using Recursive Neural Networks , 2003, Applied Intelligence.

[10]  Timothy Baldwin,et al.  Improving Parsing and PP Attachment Performance with Sense Information , 2008, ACL.

[11]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[12]  Percy Liang,et al.  Semi-Supervised Learning for Natural Language , 2005 .

[13]  Kevin Gimpel,et al.  Tailoring Continuous Word Representations for Dependency Parsing , 2014, ACL.

[14]  Horacio Rodríguez,et al.  Arabic WordNet: Semi-automatic Extensions using Bayesian Inference , 2008, LREC.

[15]  Noah A. Smith,et al.  Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers , 2013, ACL.

[16]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[17]  Hinrich Schütze,et al.  Prepositional Phrase Attachment without Oracles , 2007, Computational Linguistics.

[18]  Šuster Simon,et al.  Resolving PP-attachment ambiguity in French with distributional methods , 2012 .

[19]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[20]  Neville Ryant,et al.  A Large-scale Classication of English Verbs , 2006 .

[21]  Makoto Nagao,et al.  Corpus Based PP Attachment Ambiguity Resolution with a Semantic Dictionary , 1997, VLC.

[22]  Yonatan Belinkov,et al.  arTenTen: Arabic Corpus and Word Sketches , 2014, J. King Saud Univ. Comput. Inf. Sci..

[23]  Joakim Nivre,et al.  MaltParser: A Data-Driven Parser-Generator for Dependency Parsing , 2006, LREC.

[24]  Nizar Habash,et al.  MADA + TOKAN : A Toolkit for Arabic Tokenization , Diacritization , Morphological Disambiguation , POS Tagging , Stemming and Lemmatization , 2009 .

[25]  Regina Barzilay,et al.  Low-Rank Tensors for Scoring Dependency Structures , 2014, ACL.

[26]  Michael Collins,et al.  Prepositional Phrase Attachment through a Backed-off Model , 1995, VLC@ACL.

[27]  Nizar Habash,et al.  Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages , 2013, SPMRL@EMNLP.

[28]  Pushpak Bhattacharyya,et al.  A Flexible Unsupervised PP-Attachment Method Using Semantic Information , 2007, IJCAI.

[29]  Dan Klein,et al.  Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output , 2012, EMNLP.

[30]  Dan I. Moldovan,et al.  PP-attachment Disambiguation using Large Context , 2005, HLT.

[31]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[32]  Spence Green Improving Parsing Performance for Arabic PP Attachment Ambiguity , 2009 .

[33]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Prepositional Phrase Attachment , 1994, HLT.

[34]  Mathieu Lafourcade,et al.  PP Attachment Ambiguity Resolution with Corpus-Based Pattern Distributions and Lexical Signaturese , 1970 .

[35]  Jaouad Mousser,et al.  A Large Coverage Verb Taxonomy for Arabic , 2010, LREC.

[36]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[37]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[38]  Martin Volk Combining Unsupervised and Supervised Methods for PP Attachment Disambiguation , 2002, COLING.

[39]  Nizar Habash,et al.  CATiB: The Columbia Arabic Treebank , 2009, ACL.

[40]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[41]  Eric P. Xing,et al.  Turbo Parsers: Dependency Parsing by Approximate Variational Inference , 2010, EMNLP.

[42]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.

[43]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[44]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[45]  Christopher D. Manning,et al.  Learning Continuous Phrase Representations and Syntactic Parsing with Recursive Neural Networks , 2010 .

[46]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[47]  Xavier Carreras,et al.  Simple Semi-supervised Dependency Parsing , 2008, ACL.

[48]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[49]  Sida I. Wang,et al.  Dropout Training as Adaptive Regularization , 2013, NIPS.