Optimizing a PoS Tagset for Norwegian Dependency Parsing

This paper reports on a suite of experiments that evaluates how the linguistic granularity of part-of-speech tagsets impacts the performance of tagging and syntactic dependency parsing. Our results show that parsing accuracy can be significantly improved by introducing more finegrained morphological information in the tagset, even if tagger accuracy is compromised. Our taggers and parsers are trained and tested using the annotations of the Norwegian Dependency Treebank.

[1]  Joakim Nivre,et al.  Towards a Universal Grammar for Natural Language Processing , 2015, CICLing.

[2]  Jonas Kuhn,et al.  Morphological and Syntactic Case in Statistical Dependency Parsing , 2013, Computational Linguistics.

[3]  Janyce Wiebe,et al.  Articles: Recognizing Contextual Polarity: An Exploration of Features for Phrase-Level Sentiment Analysis , 2009, CL.

[4]  András Kornai,et al.  HunPos: an open source trigram tagger , 2007, ACL 2007.

[5]  Noah A. Smith,et al.  Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers , 2013, ACL.

[6]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[7]  Lilja Øvrelid,et al.  The Norwegian Dependency Treebank , 2014, LREC.

[8]  Beáta Megyesi Comparing Data-Driven Learning Algorithms for PoS Tagging of Swedish , 2001, EMNLP.

[9]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[10]  Andrew MacKinlay,et al.  The effects of part-of-speech tagsets on tagger performance , 2005 .

[11]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[12]  Yannick Versley,et al.  Statistical Parsing of Morphologically Rich Languages (SPMRL) What, How and Whither , 2010, SPMRL@NAACL-HLT.

[13]  Robert Östling,et al.  Stagger: an Open-Source Part of Speech Tagger for Swedish , 2013 .

[14]  Lilja Øvrelid Finite Matters , 2008, GoTAL.

[15]  Walter Daelemans,et al.  MBT: A Memory-Based Part of Speech Tagger-Generator , 1996, VLC@COLING.

[16]  Fred Karlsson,et al.  SWETWOL: A Comprehensive Morphological Analyser for Swedish , 1992 .

[17]  Nizar Habash,et al.  Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages , 2013, SPMRL@EMNLP.

[18]  Hinrich Schütze,et al.  Dependency parsing with latent refinements of part-of-speech tags , 2014, EMNLP.

[19]  Ines Rehbein,et al.  POS tagset refinement for linguistic analysis and the impact on statistical parsing , 2014 .

[20]  Lilja Øvrelid,et al.  Universal Dependencies for Norwegian , 2016, LREC.

[21]  Daniel Whyatt,et al.  Parsing German: How Much Morphology Do We Need? , 2014 .

[22]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[23]  Joakim Nivre,et al.  MaltOptimizer: An Optimization Tool for MaltParser , 2012, EACL.

[24]  Lilja Øvrelid Finite matters Verbal features in data-driven parsing of Swedish , 2008 .

[25]  Joakim Nivre,et al.  Analyzing and Integrating Dependency Parsers , 2011, CL.

[26]  Yuji Matsumoto MaltParser: A language-independent system for data-driven dependency parsing , 2005 .

[27]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[28]  Gunnar Eriksson,et al.  The Linguistic Annotation System of the Stockholm - Umea , 1993, EACL.

[29]  Janne Bondi Johannessen,et al.  OBT+Stat: Evaluation of a combined CG and statistical tagger , 2011 .

[30]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[31]  Emiliano Raúl Guevara,et al.  NoWaC: a large web-based corpus for Norwegian , 2010, WAC@NAACL-HLT.

[32]  Cristina Sánchez Marco,et al.  An open source part-of-speech tagger for Norwegian: Building on existing language resources , 2014, LREC.

[33]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[34]  Beáta Megyesi,et al.  The Open Source Tagger HunPoS for Swedish , 2009, NODALIDA.

[35]  Bernd Bohnet,et al.  Very high accuracy and fast dependency parsing is not a contradiction , 2010, COLING 2010.

[36]  Richard Johansson,et al.  The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages , 2009, CoNLL Shared Task.

[37]  Regina Barzilay,et al.  Low-Rank Tensors for Scoring Dependency Structures , 2014, ACL.

[38]  Jan Hajic,et al.  UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing , 2016, LREC.

[39]  Joel R. Tetreault,et al.  It Depends: Dependency Parser Comparison Using A Web-based Evaluation Tool , 2015, ACL.

[40]  Per Erik Solberg Building Gold-Standard Treebanks for Norwegian , 2013, NODALIDA.

[41]  Christopher D. Manning Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? , 2011, CICLing.

[42]  Marie Candito,et al.  Cross parser evaluation and tagset variation: a French treebank study , 2009 .

[43]  Joakim Nivre,et al.  Transition-based Dependency Parsing with Rich Non-local Features , 2011, ACL.