The LIGM-Alpage architecture for the SPMRL 2013 Shared Task: Multiword Expression Analysis and Dependency Parsing

This paper describes the LIGM-Alpage system for the SPMRL 2013 Shared Task. We only participated to the French part of the dependency parsing track, focusing on the realistic setting where the system is informed neither with gold tagging and morphology nor (more importantly) with gold grouping of tokens into multi-word expressions (MWEs). While the realistic scenario of predicting both MWEs and syntax has already been investigated for constituency parsing, the SPMRL 2013 shared task datasets offer the possibility to investigate it in the dependency framework. We obtain the best results for French, both for overall parsing and for MWE recognition, using a reparsing architecture that combines several parsers, with both pipeline architecture (MWE recognition followed by parsing), and joint architecture (MWE recognition performed by the parser).

[1]  Josef van Genabith,et al.  Learning Morphology with Morfette , 2008, LREC.

[2]  Ozan Arkan Can,et al.  Multiword Expressions in Statistical Dependency Parsing , 2011, SPMRL@IWPT.

[3]  Joakim Nivre,et al.  Multiword Units in Syntactic Parsing , 2004 .

[4]  Jason Eisner,et al.  Three New Probabilistic Models for Dependency Parsing: An Exploration , 1996, COLING.

[5]  Joakim Nivre,et al.  A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing , 2012, EMNLP.

[6]  B. Courtois,et al.  Un système de dictionnaires électroniques pour les mots simples du français , 1990 .

[7]  Pascal Denis,et al.  Coupling an Annotated Corpus and a Morphosyntactic Lexicon for State-of-the-Art POS Tagging with Less Human Effort , 2009, PACLIC.

[8]  Joakim Nivre,et al.  MaltParser: A Data-Driven Parser-Generator for Dependency Parsing , 2006, LREC.

[9]  Christopher D. Manning,et al.  Parsing Models for Identifying Multiword Expressions , 2013, CL.

[10]  Patrick Watrin,et al.  Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing , 2012, ACL.

[11]  Bernd Bohnet,et al.  Top Accuracy and Fast Dependency Parsing is not a Contradiction , 2010, COLING.

[12]  李幼升,et al.  Ph , 1989 .

[13]  François Yvon,et al.  Practical Very Large Scale CRFs , 2010, ACL.

[14]  Alon Lavie,et al.  Parser Combination by Reparsing , 2006, NAACL.

[15]  Josef van Genabith,et al.  Lemmatization and Statistical Lexicalized Parsing of Morphologically-Rich Languages , 2010, HLT-NAACL 2010.

[16]  Christopher D. Manning,et al.  Multiword Expression Identification with Tree Substitution Grammars: A Parsing tour de force with French , 2011, EMNLP.

[17]  Frank Keller,et al.  Lexicalization in Crosslinguistic Probabilistic Parsing: The Case of French , 2005, ACL.

[18]  Timothy Baldwin,et al.  Multiword Expressions , 2010, Handbook of Natural Language Processing.

[19]  Sivaji Bandyopadhyay,et al.  Handling Multiword Expressions in Phrase-Based Statistical Machine Translation , 2011, MTSUMMIT.

[20]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[21]  Nizar Habash,et al.  Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages , 2013, SPMRL@EMNLP.

[22]  Isabelle Tellier,et al.  Evaluating the Impact of External Lexical Resources into a CRF-based Multiword Segmenter and Part-of-Speech Tagger , 2012, LREC.

[23]  Fernando Pereira,et al.  Discriminative learning and spanning tree algorithms for dependency parsing , 2006 .

[24]  Denis Maurel,et al.  The Prolex Data Base: Toponyms and Gentiles for NLP , 1998 .

[25]  Joakim Nivre,et al.  Benchmarking of Statistical Dependency Parsers for French , 2010, COLING.