Cross-framework parser stacking for data-driven dependency parsing

In this article, we present and evaluate an approach to the combination of a grammar- driven and a data-driven parser which exploits machine learning for the acquisition of syntactic analyses guided by both parsers. We show how conversion of LFG output to dependency repre- sentation allows for a technique of parser stacking, whereby the output of the grammar-driven parser supplies features for a data-driven dependency parser. We evaluate on English and Ger- man and show signicant improvements in overall parse results stemming from the proposed dependency structure as well as other linguistic features derived from the grammars. Finally, we perform an application-oriented evaluation and explore the use of the stacked parsers as the basis for the projection of dependency annotation to a new language. R…SUM…. Dans cet article, nous prOsentons et Ovaluons une approche permettant de combiner un analyseur fondO sur une grammaire et un analyseur fondO sur des donnOes, en utilisant des mOthodes d'apprentissage automatique pour produire des analyses syntaxiques guidOes par les deux analyseurs. Nous montrons comment la conversion de la sortie d'un analyseur LFG en une reprOsentation en dOpendances permet d'utiliser une technique d'empilement d'analyseurs ("parser stacking"), dans laquelle la sortie de l'analyseur fondO sur une grammaire fournit des caractOristiques utilisables par un analyseur fondO sur les donnOes. Nous Ovaluons notre approche sur l'anglais et l'allemand, et montrons des amOliorations signicatives pour les rO- sultats d'analyses syntaxiques compltes qui dOcoulent de l'analyse en dOpendances ainsi que des caractOristiques provenant de grammaires. Enn, nous procOdons ‡ une Ovaluation dOdiOe ‡ une application, et explorons l'utilisation de cet empilement d'analyseurs comme point de dOpart pour l'annotation en dOpendances d'une nouvelle langue.

[1]  Berthold Crysmann,et al.  Towards a Dependency-Based Gold Standard for German Parsers. The TIGER Dependency Bank , 2004, International Workshop On Linguistically Interpreted Corpora.

[2]  Erhard W. Hinrichs,et al.  Is it Really that Difficult to Parse German? , 2006, EMNLP.

[3]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[4]  Joakim Nivre,et al.  Pseudo-Projective Dependency Parsing , 2005, ACL.

[5]  Jun'ichi Tsujii,et al.  Dependency Parsing and Domain Adaptation with LR Models and Parser Ensembles , 2007, EMNLP.

[6]  Stephan Oepen,et al.  Efficiency in Unification-Based N-Best Parsing , 2007, Trends in Parsing Technology.

[7]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[8]  Philip Resnik,et al.  Bootstrapping parsers via syntactic projection across parallel texts , 2005, Natural Language Engineering.

[9]  Joakim Nivre,et al.  An Efficient Algorithm for Projective Dependency Parsing , 2003, IWPT.

[10]  David Yarowsky,et al.  Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora , 2001, HLT.

[11]  Joakim Nivre,et al.  When word order and part-of-speech tags are not enough - Swedish dependency parsing with rich linguistic features , 2007 .

[12]  Jonas Kuhn,et al.  Data-Driven Dependency Parsing of New Languages Using Incomplete and Noisy Training Data , 2009, CoNLL.

[13]  Mark Johnson,et al.  Parsing the Wall Street Journal using a Lexical-Functional Grammar and Discriminative Estimation Techniques , 2002, ACL.

[14]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[15]  Rens Bod,et al.  Beyond Grammar: An Experience-Based Theory of Language , 1998 .

[16]  Gertjan van Noord,et al.  The Alpino Dependency Treebank , 2001, CLIN.

[17]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[18]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[19]  Joakim Nivre,et al.  Integrating Graph-Based and Transition-Based Dependency Parsers , 2008, ACL.

[20]  Mark Steedman,et al.  Generative Models for Statistical Parsing with Combinatory Categorial Grammar , 2002, ACL.

[21]  Richard Johansson,et al.  Extended Constituent-to-Dependency Conversion for English , 2007, NODALIDA.

[22]  Lilja Øvrelid,et al.  Improving data-driven dependency parsing using large-scale LFG grammars , 2009, ACL/IJCNLP.

[23]  Andy Way,et al.  Wide-Coverage Deep Statistical Parsing Using Automatic Dependency Structure Annotation , 2008, CL.

[24]  Yi Zhang,et al.  Hybrid Learning of Dependency Structures from Heterogeneous Linguistic Resources , 2008, CoNLL.

[25]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[26]  James R. Curran,et al.  Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models , 2007, Computational Linguistics.

[27]  Stefan Riezler,et al.  Speed and Accuracy in Shallow and Deep Stochastic Parsing , 2004, NAACL.

[28]  Jun'ichi Tsujii,et al.  Probabilistic Disambiguation Models for Wide-Coverage HPSG Parsing , 2005, ACL.

[29]  Joakim Nivre,et al.  MaltParser: A Data-Driven Parser-Generator for Dependency Parsing , 2006, LREC.

[30]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[31]  Miriam Butt,et al.  The Parallel Grammar Project , 2002, COLING 2002.

[32]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[33]  Yi Zhang,et al.  Cross-Domain Dependency Parsing Using a Deep Linguistic Grammar , 2009, ACL/IJCNLP.

[34]  Wolfgang Lezius,et al.  TIGER: Linguistic Interpretation of a German Corpus , 2004 .

[35]  Joakim Nivre,et al.  Labeled Pseudo-Projective Dependency Parsing with Support Vector Machines , 2006, CoNLL.

[36]  Paul Meurer,et al.  Speeding up LFG Parsing Using C-Structure Pruning , 2008, COLING 2008.

[37]  Peter Steven Rosenbaum,et al.  The grammar of English predicate complement constructions , 1967 .