Crowdsourcing Swiss Dialect Transcriptions for Assessing Factors in Writing Variations

In this paper, we systematically analyze writing variations of Swiss German in two existing corpora with standard German glosses, a corpus of 10,000 short text messages and a corpus of transcribed oral history recordings (90,000 tokens). We show that neither resource is sufficient for assessing factors in writing variations of users and describe a data collection project involving a citizen science community for solving this problem. Laymen will independently and redundantly transcribe 1,200 short samples (15-20 seconds) of audio material in Swiss German according to their own best practice.

[1]  Beat Siebenhaar Quantitative Approaches to Linguistic Variation in IRC: Implications for Qualitative Research , 2008 .

[2]  Luigi Cattaneo,et al.  Automatic audiovisual integration in speech perception , 2005, Experimental Brain Research.

[3]  Brigitte Aschwanden Wär wot chätä?' Zum Sprachverhalten deutschschweizerischer Chatter , 2001 .

[4]  W. Marti Berndeutsch-Grammatik für die heutige Mundart zwischen Thun und Jura , 1986 .

[5]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[6]  Helmut Schmid,et al.  Estimation of Conditional Probabilities With Decision Trees and an Application to Fine-Grained POS Tagging , 2008, COLING.

[7]  Jean-Philippe Goldman,et al.  EasyAlign: An Automatic Phonetic Alignment Tool Under Praat , 2011, INTERSPEECH.

[8]  Florian Schiel,et al.  Untrained Forced Alignment of Transcriptions and Audio for Language Documentation Corpora using WebMAUS , 2014, LREC.

[9]  Yves Scherrer,et al.  Normalising orthographic and dialectal variants for the automatic processing of Swiss German , 2015 .

[10]  Yves Scherrer,et al.  ArchiMob - A Corpus of Spoken Swiss German , 2016, LREC.

[11]  Marcos Zampieri,et al.  The Taming of a Dialect: Interlinear Glossing of Swiss German Text Messages , 2013 .

[12]  Martin Neef,et al.  Anything goes? SMS, phonographisches Schreiben und Morphemkonstanz , 2013 .

[13]  Helen Christen Dialekt - Schreiben oder "sorry ech hassä Text schribä" , 2004 .

[14]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[15]  Beat Siebenhaar,et al.  Sprachgeographische Aspekte der Morphologie und Verschriftung in schweizerdeutschen Chats. , 2013 .

[16]  Gottfried Kolde,et al.  Sprachkontakte in gemischtsprachigen Städten : vergleichende Untersuchungen über Voraussetzungen und Formen sprachlicher Interaktion verschiedensprachiger Jugendlicher in den Schweizer Städten Biel/Bienne und Fribourg/Freiburg i. Ue. , 1983 .