Statistical French Dependency Parsing: Treebank Conversion and First Results

We first describe the automatic conversion of the French Treebank (Abeille and Barrier, 2004), a constituency treebank, into typed projective dependency trees. In order to evaluate the overall quality of the resulting dependency treebank, and to quantify the cases where the projectivity constraint leads to wrong dependencies, we compare a subset of the converted treebank to manually validated dependency trees. We then compare the performance of two treebank-trained parsers that output typed dependency parses. The first parser is the MST parser (McDonald et al., 2006), which we directly train on dependency trees. The second parser is a combination of the Berkeley parser (Petrov et al., 2006) and a functional role labeler: trained on the original constituency treebank, the Berkeley parser first outputs constituency trees, which are then labeled with functional roles, and then converted into dependency trees. We found that used in combination with a high-accuracy French POS tagger, the MST parser performs a little better for unlabeled dependencies (UAS=90.3% versus 89.6%), and better for labeled dependencies (LAS=87.6% versus 85.6%).

[1]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[2]  Marie Candito,et al.  Improving generative statistical parsing with semi-supervised word clustering , 2009, IWPT.

[3]  Josef van Genabith,et al.  Preparing, restructuring, and augmenting a French treebank:lexicalised parsers or coherent treebanks? , 2007 .

[4]  Koby Crammer,et al.  Ultraconservative Online Algorithms for Multiclass Problems , 2001, J. Mach. Learn. Res..

[5]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[6]  Marie Candito,et al.  Cross parser evaluation and tagset variation: a French treebank study , 2009 .

[7]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[8]  Anne Abeillé,et al.  Enriching a French Treebank , 2004, LREC.

[9]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[10]  Fernando Pereira,et al.  Multilingual Dependency Analysis with a Two-Stage Discriminative Parser , 2006, CoNLL.

[11]  Richard Johansson,et al.  Extended Constituent-to-Dependency Conversion for English , 2007, NODALIDA.

[12]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[13]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[14]  Marie Candito,et al.  Expériences d’analyse syntaxique statistique du français , 2008, JEPTALNRECITAL.

[15]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[16]  Roger Levy,et al.  Tregex and Tsurgeon: tools for querying and manipulating tree data structures , 2006, LREC.

[17]  Pascal Denis,et al.  Analyse syntaxique du français : des constituants aux dépendances , 2009 .

[18]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[19]  Pascal Denis,et al.  Coupling an Annotated Corpus and a Morphosyntactic Lexicon for State-of-the-Art POS Tagging with Less Human Effort , 2009, PACLIC.

[20]  Fernando Pereira,et al.  Online Learning of Approximate Dependency Parsing Algorithms , 2006, EACL.

[21]  Frank Keller,et al.  Lexicalization in Crosslinguistic Probabilistic Parsing: The Case of French , 2005, ACL.

[22]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[23]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.