Training Parsers on Partial Trees: A Cross-language Comparison

We present a study that compares data-driven dependency parsers obtained by means of annotation projection between language pairs of varying structural similarity. We show how the partial dependency trees projected from English to Dutch, Italian and German can be exploited to train parsers for the target languages. We evaluate the parsers against manual gold standard annotations and find that the projected parsers substantially outperform our heuristic baselines by 9―25% UAS, which corresponds to a 21―43% reduction in error rate. A comparative error analysis focuses on how the projected target language parsers handle subjects, which is especially interesting for Italian as an instance of a pro-drop language. For Dutch, we further present experiments with German as an alternative source language. In both source languages, we contrast standard baseline parsers with parsers that are enhanced with the predictions from large-scale LFG grammars through a technique of parser stacking, and show that improvements of the source language parser can directly lead to similar improvements of the projected target language parser.

[1]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[2]  Stefan Riezler,et al.  Speed and Accuracy in Shallow and Deep Stochastic Parsing , 2004, NAACL.

[3]  Mirella Lapata,et al.  Optimal Constituent Alignment with Edge Covers for Semantic Projection , 2006, ACL.

[4]  Lilja Øvrelid,et al.  Cross-framework parser stacking for data-driven dependency parsing , 2009, TAL.

[5]  Joakim Nivre,et al.  Labeled Pseudo-Projective Dependency Parsing with Support Vector Machines , 2006, CoNLL.

[6]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[7]  Jonas Kuhn,et al.  Data-Driven Dependency Parsing of New Languages Using Incomplete and Noisy Training Data , 2009, CoNLL.

[8]  Joakim Nivre,et al.  Integrating Graph-Based and Transition-Based Dependency Parsers , 2008, ACL.

[9]  Berthold Crysmann,et al.  Towards a Dependency-Based Gold Standard for German Parsers. The TIGER Dependency Bank , 2004, International Workshop On Linguistically Interpreted Corpora.

[10]  Philip Resnik,et al.  Bootstrapping parsers via syntactic projection across parallel texts , 2005, Natural Language Engineering.

[11]  Erhard W. Hinrichs,et al.  Is it Really that Difficult to Parse German? , 2006, EMNLP.

[12]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[13]  Lilja Øvrelid,et al.  Improving data-driven dependency parsing using large-scale LFG grammars , 2009, ACL/IJCNLP.

[14]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[15]  Kathrin Spreyer,et al.  Projection-based Acquisition of a Temporal Labeller , 2008, IJCNLP.

[16]  David Yarowsky,et al.  Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora , 2001, HLT.

[17]  Gertjan van Noord,et al.  The Alpino Dependency Treebank , 2001, CLIN.

[18]  Mirella Lapata,et al.  Cross-Lingual Bootstrapping of Semantic Lexicons: The Case of FrameNet , 2005, AAAI.

[19]  Sabine Brants,et al.  The TIGER Treebank , 2001 .