tweeDe – A Universal Dependencies treebank for German tweets

We introduce the first German treebank for Twitter microtext, annotated within the framework of Universal Dependencies. The new treebank includes over 12,000 tokens from over 500 tweets, independently annotated by two human coders. In the paper, we describe the data selection and annotation process and present baseline parsing results for the new testsuite.

[1]  Noah A. Smith,et al.  A Dependency Parser for Tweets , 2014, EMNLP.

[2]  Joakim Nivre,et al.  MaltParser: A Data-Driven Parser-Generator for Dependency Parsing , 2006, LREC.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Arnulf Deppermann Gesprächsforschung - Online-Zeitschrift zur verbalen Interaktion , 2000 .

[5]  Yijia Liu,et al.  Parsing Tweets into Universal Dependencies , 2018, NAACL.

[6]  Erhard W. Hinrichs,et al.  The Tüba-D/Z Treebank: Annotating German with a Context-Free Backbone , 2004, LREC.

[7]  Wolfgang Menzel,et al.  Because Size Does Matter: The Hamburg Dependency Treebank , 2014, LREC.

[8]  Silvia Bernardini,et al.  The WaCky wide web: a collection of very large linguistically processed web-crawled corpora , 2009, Lang. Resour. Evaluation.

[9]  Thomas Schmidt,et al.  Handbuch für das computergestützte Transkribieren nach HIAT , 2004 .

[10]  Timothy Dozat,et al.  Stanford’s Graph-based Neural Dependency Parser at the CoNLL 2017 Shared Task , 2017, CoNLL.

[11]  Milan Straka,et al.  Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe , 2017, CoNLL.

[12]  Arne Köhn,et al.  Dependency Tree Transformation with Tree Transducers , 2017, UDW@NoDaLiDa.

[13]  Allan Ramsay,et al.  Arabic Tweets Treebanking and Parsing: A Bootstrapping Approach , 2017, WANLP@EACL.

[14]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[15]  Walt Detmar Meurers,et al.  Detecting Errors in Part-of-Speech Annotation , 2003, EACL.

[16]  Joakim Nivre,et al.  Universal Dependency Annotation for Multilingual Parsing , 2013, ACL.

[17]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[18]  Cristina Bosco,et al.  PoSTWITA-UD: an Italian Twitter Treebank in Universal Dependencies , 2018, LREC.

[19]  Josef van Genabith,et al.  #hardtoparse: POS Tagging and Parsing the Twitterverse , 2011, Analyzing Microtext.

[20]  Brendan T. O'Connor,et al.  Twitter Universal Dependency Parsing for African-American and Mainstream American English , 2018, ACL.

[21]  Francis M. Tyers,et al.  UD Annotatrix: An annotation tool for Universal Dependencies , 2018, TLT.

[22]  Çagri Çöltekin,et al.  Converting the TüBa-D/Z Treebank of German to Universal Dependencies , 2017, UDW@NoDaLiDa.