The ParisNLP entry at the ConLL UD Shared Task 2017: A Tale of a #ParsingTragedy

We present the ParisNLP entry at the UDCoNLL 2017 parsing shared task. In addition to the UDpipe models provided, we built our own data-driven tokenization models, sentence segmenter and lexicon- based morphological analyzers. All of these were used with a range of different parsing models (neural or not, feature-rich or not, transition or graph-based, etc.) and the best combination for each language was selected. Unfortunately, a glitch in the shared task’s Matrix led our model selector to run generic, weakly lexicalized mod- els, tailored for surprise languages, instead of our dataset-specific models. Because of this #ParsingTragedy, we officially ranked 27th, whereas our real models finally unofficially ranked 6th.

[1]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[2]  Slav Petrov,et al.  Overview of the 2012 Shared Task on Parsing the Web , 2012 .

[3]  Nizar Habash,et al.  Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages , 2013, SPMRL@EMNLP.

[4]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[5]  Jan Hajic,et al.  UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing , 2016, LREC.

[6]  Reut Tsarfaty,et al.  Introducing the SPMRL 2014 Shared Task on Parsing Morphologically-rich Languages , 2014 .

[7]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[8]  Éric Villemonte de la Clergerie,et al.  Exploring beam-based shift-reduce dependency parsing with DyALog: Results from the SPMRL 2013 shared task , 2013, SPMRL@EMNLP.

[9]  Martin Potthast,et al.  CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies , 2018, CoNLL.

[10]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[11]  Pascal Denis,et al.  Coupling an annotated corpus and a lexicon for state-of-the-art POS tagging , 2012, Lang. Resour. Evaluation.

[12]  Francis M. Tyers,et al.  Universal Dependencies , 2017, EACL.

[13]  Benno Stein,et al.  Improving the Reproducibility of PAN's Shared Tasks: - Plagiarism Detection, Author Identification, and Author Profiling , 2014, CLEF.