Turku Neural Parser Pipeline: An End-to-End System for the CoNLL 2018 Shared Task

In this paper we describe the TurkuNLP entry at the CoNLL 2018 Shared Task on Multilingual Parsing from Raw Text to Universal Dependencies. Compared to the last year, this year the shared task includes two new main metrics to measure the morphological tagging and lemmatization accuracies in addition to syntactic trees. Basing our motivation into these new metrics, we developed an end-to-end parsing pipeline especially focusing on developing a novel and state-of-the-art component for lemmatization. Our system reached the highest aggregate ranking on three main metrics out of 26 teams by achieving 1st place on metric involving lemmatization, and 2nd on both morphological tagging and parsing.

[1]  Alexander M. Rush,et al.  OpenNMT: Open-Source Toolkit for Neural Machine Translation , 2017, ACL.

[2]  Martin Potthast,et al.  CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies , 2018, CoNLL.

[3]  Timothy Dozat,et al.  Stanford’s Graph-based Neural Dependency Parser at the CoNLL 2017 Shared Task , 2017, CoNLL.

[4]  Timothy Dozat,et al.  Deep Biaffine Attention for Neural Dependency Parsing , 2016, ICLR.

[5]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[6]  André F. T. Martins,et al.  Marian: Fast Neural Machine Translation in C++ , 2018, ACL.

[7]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[8]  Nizar Habash,et al.  CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies , 2017, CoNLL.

[9]  Ryan Cotterell,et al.  CoNLL-SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection in 52 Languages , 2017, CoNLL.

[10]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[11]  Francis M. Tyers,et al.  Free/Open-Source Resources in the Apertium Platform for Machine Translation Research and Development , 2010, Prague Bull. Math. Linguistics.

[12]  Katharina Kann,et al.  The LMU System for the CoNLL-SIGMORPHON 2017 Shared Task on Universal Morphological Reinflection , 2017, CoNLL.

[13]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[14]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[15]  Katharina Kann,et al.  Training Data Augmentation for Low-Resource Morphological Inflection , 2017, CoNLL.

[16]  Daniel Zeman,et al.  CoNLL 2017 Shared Task - Automatically Annotated Raw Texts and Word Embeddings , 2017 .

[17]  Jan Hajic,et al.  UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing , 2016, LREC.