Benchmarking of Statistical Dependency Parsers for French

We compare the performance of three statistical parsing architectures on the problem of deriving typed dependency structures for French. The architectures are based on PCFGs with latent variables, graph-based dependency parsing and transition-based dependency parsing, respectively. We also study the influence of three types of lexical information: lemmas, morphological features, and word clusters. The results show that all three systems achieve competitive performance, with a best labeled attachment score over 88%. All three parsers benefit from the use of automatically derived lemmas, while morphological features seem to be less important. Word clusters have a positive effect primarily on the latent variable parser.

[1]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[2]  Patrick Paroubek,et al.  Large scale production of syntactic annotations for French , 2008 .

[3]  Benoît Sagot,et al.  The Lefff, a Freely Available and Large-coverage Morphological and Syntactic Lexicon for French , 2010, LREC.

[4]  Marie Candito,et al.  Expériences d’analyse syntaxique statistique du français , 2008, JEPTALNRECITAL.

[5]  Daniel Jurafsky,et al.  Parsing to Stanford Dependencies: Trade-offs between Speed and Accuracy , 2010, LREC.

[6]  Patrick Paroubek,et al.  Les résultats de la campagne EASY d'évaluation des analyseurs syntaxiques du français , 2007 .

[7]  Pascal Denis,et al.  Statistical French Dependency Parsing: Treebank Conversion and First Results , 2010, LREC.

[8]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[9]  Fernando Pereira,et al.  Online Learning of Approximate Dependency Parsing Algorithms , 2006, EACL.

[10]  Fernando Pereira,et al.  Multilingual Dependency Analysis with a Two-Stage Discriminative Parser , 2006, CoNLL.

[11]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[12]  Joakim Nivre,et al.  Inductive Dependency Parsing (Text, Speech and Language Technology) , 2006 .

[13]  Xavier Carreras,et al.  Simple Semi-supervised Dependency Parsing , 2008, ACL.

[14]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[15]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[16]  Joakim Nivre,et al.  MaltParser: A Data-Driven Parser-Generator for Dependency Parsing , 2006, LREC.

[17]  Sandra Kübler The PaGe 2008 Shared Task on Parsing German , 2008 .

[18]  Ted Briscoe,et al.  Parser evaluation: a survey and a new proposal , 1998, LREC.

[19]  Fernando Pereira,et al.  Discriminative learning and spanning tree algorithms for dependency parsing , 2006 .

[20]  Reut Tsarfaty,et al.  Integrated Morphological and Syntactic Disambiguation for Modern Hebrew , 2006, ACL.

[21]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[22]  Pascal Denis,et al.  Coupling an Annotated Corpus and a Morphosyntactic Lexicon for State-of-the-Art POS Tagging with Less Human Effort , 2009, PACLIC.

[23]  Daniel M. Bikel,et al.  Design of a multi-lingual, parallel-processing statistical parsing engine , 2002 .

[24]  Dekang Lin,et al.  A dependency-based method for evaluating broad-coverage parsers , 1995, Natural Language Engineering.

[25]  Alexis Nasr,et al.  Pseudo-Projectivity, A Polynomially Parsable Non-Projective Dependency Grammar , 1998, ACL.

[26]  Josef van Genabith,et al.  Lemmatization and Lexicalized Statistical Parsing of Morphologically-Rich Languages: the Case of French , 2010, SPMRL@NAACL-HLT.

[27]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[28]  Anne Abeillé,et al.  Enriching a French Treebank , 2004, LREC.

[29]  Joakim Nivre,et al.  Inductive Dependency Parsing , 2006, Text, speech and language technology.

[30]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[31]  Joakim Nivre,et al.  Algorithms for Deterministic Incremental Dependency Parsing , 2008, CL.

[32]  Marie Candito,et al.  Parsing Word Clusters , 2010, SPMRL@NAACL-HLT.

[33]  Joakim Nivre,et al.  Characterizing the Errors of Data-Driven Dependency Parsing Models , 2007, EMNLP.

[34]  Marie Candito,et al.  Improving generative statistical parsing with semi-supervised word clustering , 2009, IWPT.

[35]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[36]  Josef van Genabith,et al.  Lemmatization and Statistical Lexicalized Parsing of Morphologically-Rich Languages , 2010, HLT-NAACL 2010.

[37]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[38]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[39]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[40]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[41]  Marie Candito,et al.  Cross parser evaluation and tagset variation: a French treebank study , 2009 .