论文信息 - A Resource for Natural Language Processing of Swiss German Dialects

A Resource for Natural Language Processing of Swiss German Dialects

Since there are only a few resources for Swiss German dialects, we compiled a corpus of 115,000 tokens, manually annotated with PoStags. The goal is to provide a basic data set for developing NLP applications for Swiss German. We extended the original corpus and improved its annotation consistency. Furthermore, we trained dialect-specific PoS-tagging models and implemented a baseline system for dialect identification.

Nora Hollenstein | Noëmi Aepli | Nora Hollenstein | Noëmi Aepli

[1] Yves Scherrer,et al. Word-Based Dialect Identification with Georeferenced Rules , 2010, EMNLP.

[2] Nora Hollenstein,et al. Compilation of a Swiss German Dialect Corpus and its Application to PoS Tagging , 2014, VarDial@COLING.

[3] Walt Detmar Meurers,et al. Detecting Errors in Part-of-Speech Annotation , 2003, EACL.

[4] Yves Scherrer,et al. Natural Language Processing for the Swiss German Dialect Area , 2010, KONVENS.

[5] Tanja Samardzic,et al. Lemmatisation as a Tagging Task , 2012, ACL.