Joint Ensemble Model for POS Tagging and Dependency Parsing

In this paper we present several approaches towards constructing joint ensemble models for morphosyntactic tagging and dependency parsing for a morphologically rich language ‐ Bulgarian. In our experiments we use state-of-the-art taggers and dependency parsers to obtain an extended version of the treebank for Bulgarian, BulTreeBank, which, in addition to the standard CoNLL fields, contains predicted morphosyntactic tags and dependency arcs for each word. In order to select the most suitable tag and arc from the proposed ones, we use several ensemble techniques, the result of which is a valid dependency tree. Most of these approaches show improvement over the results achieved individually by the tools for tagging and parsing.

[1]  Joakim Nivre,et al.  Characterizing the Errors of Data-Driven Dependency Parsing Models , 2007, EMNLP.

[2]  Joakim Nivre,et al.  Joint Morphological and Syntactic Analysis for Richly Inflected Languages , 2013, TACL.

[3]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[4]  Joakim Nivre,et al.  Integrating Graph-Based and Transition-Based Dependency Parsers , 2008, ACL.

[5]  Felice Dell'Orletta,et al.  Reverse Revision and Linear Tree Combination for Dependency Parsing , 2009, HLT-NAACL.

[6]  Georgi Georgiev,et al.  Combining POS Tagging, Dependency Parsing and Coreferential Resolution for Bulgarian , 2013, RANLP.

[7]  David A. Smith,et al.  A Discriminative Model for Joint Morphological Disambiguation and Dependency Parsing , 2011, ACL.

[8]  Reut Tsarfaty,et al.  A Single Generative Model for Joint Morphological Segmentation and Syntactic Parsing , 2008, ACL.

[9]  Kiril Ivanov Simov,et al.  A System for Experiments with Dependency Parsers , 2014, LREC.

[10]  Fredric C. Gey,et al.  Proceedings of LREC , 2010 .

[11]  Noah A. Smith,et al.  Joint Morphological and Syntactic Disambiguation , 2007, EMNLP.

[12]  Philipp Koehn,et al.  Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) , 2007 .

[13]  Bernd Bohnet,et al.  Very high accuracy and fast dependency parsing is not a contradiction , 2010, COLING 2010.

[14]  Noah A. Smith,et al.  Dual Decomposition with Many Overlapping Components , 2011, EMNLP.

[15]  Preslav Nakov,et al.  Feature-Rich Part-of-speech Tagging for Morphologically Complex Languages: Application to Bulgarian , 2012, EACL.

[16]  Joakim Nivre,et al.  MaltParser: A Data-Driven Parser-Generator for Dependency Parsing , 2006, LREC.

[17]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[18]  Petya Osenova,et al.  Integration of Dependency Parsers for Bulgarian , 2014 .

[19]  Kiril Ivanov Simov,et al.  Constituency Parsing of Bulgarian: Word- vs Class-based Parsing , 2014, LREC.

[20]  Eric P. Xing,et al.  Turbo Parsers: Dependency Parsing by Approximate Variational Inference , 2010, EMNLP.

[21]  Mihai Surdeanu,et al.  Ensemble Models for Dependency Parsing: Cheap and Good? , 2010, HLT-NAACL.

[22]  Fernando Pereira,et al.  Discriminative learning and spanning tree algorithms for dependency parsing , 2006 .

[23]  Jun'ichi Tsujii,et al.  Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese , 2012, ACL.