Towards a syntactically motivated analysis of modifiers in German

The Stuttgart-Tubingen Tagset (STTS) is a widely used POS annotation scheme for German which provides 54 different tags for the analysis on the part of speech level. The tagset, however, does not distinguish between adverbs and different types of particles used for expressing modality, intensity, graduation, or to mark the focus of the sentence. In the paper, we present an extension to the STTS which provides tags for a more fine-grained analysis of modification, based on a syntactic perspective on parts of speech. We argue that the new classification not only enables us to do corpus-based linguistic studies on modification, but also improves statistical parsing. We give proof of concept by training a data-driven dependency parser on data from the TiGer treebank, providing the parser a) with the original STTS tags and b) with the new tags. Results show an improved labelled accuracy for the new, syntactically motivated classification.

[1]  Markus Dickinson,et al.  An Investigation into Improving Part-of-Speech Tagging , 2006 .

[2]  José Gabriel Pereira Lopes,et al.  Detection of Strange and Wrong Automatic Part-of-Speech Tagging , 2007, EPIA Workshops.

[3]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[4]  Hans van Halteren,et al.  The Detection of Inconsistency in Manually Tagged Text , 2000, COLING 2000.

[5]  Timothy Baldwin,et al.  POS Tagging with a More Informative Tagset , 2005, ALTA.

[6]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[7]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[8]  Wulf Oesterreicher,et al.  Sprache der Nähe — Sprache der Distanz. Mündlichkeit und Schriftlichkeit im Spannungsfeld von Sprachtheorie und Sprachgeschichte , 1985, Romanistisches Jahrbuch.

[9]  Hrafn Loftsson,et al.  Correcting a POS-Tagged Corpus Using Three Complementary Methods , 2009, EACL.

[10]  P. Koch,et al.  Sprache der Nähe — Sprache der Distanz. Mündlichkeit und Schriftlichkeit im Spannungsfeld von Sprachtheorie und Sprachgeschichte , 1985, Romanistisches Jahrbuch.

[11]  Bernd Bohnet,et al.  Very high accuracy and fast dependency parsing is not a contradiction , 2010, COLING 2010.

[12]  Mary Dalrymple How much can part-of-speech tagging help parsing? , 2006, Nat. Lang. Eng..

[13]  Joakim Nivre,et al.  MaltOptimizer: An Optimization Tool for MaltParser , 2012, EACL.

[14]  Dirk Hovy,et al.  Learning part-of-speech taggers with inter-annotator agreement loss , 2014, EACL.

[15]  Wolfgang Lezius,et al.  TIGER: Linguistic Interpretation of a German Corpus , 2004 .

[16]  Sandra Kübler,et al.  Über den Einfluss von Part-of-Speech-Tags auf Parsing-Ergebnisse , 2013, J. Lang. Technol. Comput. Linguistics.

[17]  Ines Rehbein,et al.  The KiezDeutsch Korpus (KiDKo) Release 1.0 , 2014, LREC.

[18]  Joakim Nivre,et al.  MaltParser: A Language-Independent System for Data-Driven Dependency Parsing , 2007, Natural Language Engineering.

[19]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[20]  Walt Detmar Meurers,et al.  Detecting Errors in Part-of-Speech Annotation , 2003, EACL.

[21]  Martha Palmer,et al.  Reducing the Need for Double Annotation , 2011, Linguistic Annotation Workshop.

[22]  Christopher D. Manning Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? , 2011, CICLing.