PassPort: A Dependency Parsing Model for Portuguese

Parsers are essential tools for several NLP applications. Here we introduce PassPort, a model for the dependency parsing of Portuguese trained with the Stanford Parser. For developing PassPort, we observed which approach performed best in several setups using different existing parsing algorithms and combinations of linguistic information. PassPort achieved an UAS of 87.55 and a LAS of 85.21 in the Universal Dependencies corpus. We also evaluated the model’s performance in relation to another model and different corpora containing three genres. For that, we annotated random sentences from these corpora using PassPort and the PALAVRAS parsing system. We then carried out a manual evaluation and comparison of both models. They achieved very similar results for dependency parsing, with a LAS of 85.02 for PassPort against 84.36 for PALAVRAS. In addition, the results from the analysis showed us that better performance in the part-of-speech tagging could improve our LAS.

[1]  Pablo Gamallo Dependency Parsing with Compression Rules , 2015, IWPT.

[2]  Yue Zhang,et al.  A Neural Probabilistic Structured-Prediction Model for Transition-Based Dependency Parsing , 2015, ACL.

[3]  Fernando Pereira,et al.  Online Learning of Approximate Dependency Parsing Algorithms , 2006, EACL.

[4]  Joakim Nivre,et al.  Labeled Pseudo-Projective Dependency Parsing with Support Vector Machines , 2006, CoNLL.

[5]  Joakim Nivre,et al.  MaltParser: A Data-Driven Parser-Generator for Dependency Parsing , 2006, LREC.

[6]  Aline Villavicencio,et al.  Crawling by Readability Level , 2016, PROPOR.

[7]  António Branco,et al.  Out-of-the-Box Robust Parsing of Portuguese , 2010, PROPOR.

[8]  Pablo Gamallo Otero,et al.  A grammatical formalism based on patterns of part of speech tags , 2011 .

[9]  Aline Villavicencio,et al.  Automatic Construction of Large Readability Corpora , 2016, CL4LC@COLING 2016.

[10]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[11]  António Branco,et al.  CINTIL DepBank Handbook: Design options for the representation of grammatical dependencies , 2011 .

[12]  Joakim Nivre,et al.  Analyzing and Integrating Dependency Parsers , 2011, CL.

[13]  Koby Crammer,et al.  Online Large-Margin Training of Dependency Parsers , 2005, ACL.

[14]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[15]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[16]  Fernando Pereira,et al.  Multilingual Dependency Analysis with a Two-Stage Discriminative Parser , 2006, CoNLL.

[17]  Nizar Habash,et al.  CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies , 2017, CoNLL.

[18]  Danqi Chen,et al.  A Fast and Accurate Dependency Parser using Neural Networks , 2014, EMNLP.

[19]  Aline Villavicencio,et al.  The brWaC Corpus: A New Open Resource for Brazilian Portuguese , 2018, LREC.

[20]  Eckhard Bick,et al.  Universal Dependencies for Portuguese , 2017, DepLing.

[21]  Daniel Zeman,et al.  Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies , 2017, CoNLL Shared Task.

[22]  Jörg Tiedemann,et al.  Finding Alternative Translations in a Large Corpus of Movie Subtitle , 2016, LREC.