Drawing areal information from a corpus of noisy dialect data

This article is an analysis of linguistic survey data representing German dialects in Switzerland in 1933/34 based on the so-called Wenker sentences. The data are impressionistic in terms of applied phonetic transcriptions, which were produced by non-specialists using the Latin alphabet. Due to the lack of pre-defined standardization, the phonetic transcriptions are very heterogeneous. From a technical perspective, this leads to very noisy data, which is why the validity of the Wenker data in general and the Swiss Wenker data in particular has been questioned. Using methods from computational linguistics, we compare, for the first time, Wenker data with linguistic data collected at virtually the same time by linguistics professionals. Direct comparison with a sample from the published atlas of German-speaking Switzerland (SDS) reveals that despite the noisiness of the data, they nevertheless provide reliable information, e.g., in terms of the spatial structuring of Swiss dialects. The study is thus a successful pilot for other corpus-based studies dealing with unstructured Wenker data in other regions.

[1]  Yves Scherrer,et al.  Computerlinguistische Experimente für die schweizerdeutsche Dialektlandschaft: Maschinelle Übersetzung und Dialektometrie , 2014 .

[2]  Mark P. J. van der Loo,et al.  The stringdist Package for Approximate String Matching , 2014, R J..

[3]  Wilbert Jan Heeringa Measuring dialect pronunciation differences using Levenshtein distance , 2004 .

[4]  J. Schmidt,et al.  Sprachdynamik : eine Einführung in die moderne Regionalsprachenforschung , 2011 .

[5]  Elvira Glaser,et al.  Kleiner Sprachatlas der deutschen Schweiz , 2013 .

[6]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[7]  Yves Scherrer,et al.  Normalising orthographic and dialectal variants for the automatic processing of Swiss German , 2015 .

[8]  Matthias Friedli Der Komparativanschluss im Schweizerdeutschen: Arealität, Variation und Wandel , 2012 .

[9]  Oliver Schallert,et al.  … dass die Milch bald an zu kochen fängt: Zum Phänomen der sogenannten »Binnenspaltung« in deutschen Dialekten , 2015 .

[10]  Hans Goebl,et al.  Kurzbericht über die Dialektometrisierung des Gesamtnetzes des „Sprachatlasses der deutschen Schweiz“ (SDS) , 2013 .

[11]  Alfred Lameli,et al.  Strukturen im Sprachraum : Analysen zur arealtypologischen Komplexität der Dialekte in Deutschland , 2013 .

[12]  Alfred Lameli,et al.  Language and Space: An International Handbook of Linguistic Variation , 2010 .

[13]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[14]  J. Chambers,et al.  Dialectology: MECHANISMS OF VARIATION , 1998 .

[15]  Alfred Lameli,et al.  19. Dynamic linguistic maps and validation , 2010 .

[16]  John Nerbonne,et al.  Toward a dialectological yardstick* , 2007, J. Quant. Linguistics.

[17]  Marie Chavent,et al.  ClustGeo: an R package for hierarchical clustering with spatial constraints , 2017, Computational Statistics.

[18]  Yves Scherrer,et al.  A quantitative approach to Swiss German – Dialectometric analyses and comparisons of linguistic levels , 2016 .

[19]  Nadja Kakhro Die Schweizer Wenkersätze , 2013 .

[20]  R. Sieber,et al.  Demographie und Raum in der Schweiz , 2002 .

[21]  Herbert Ernst Wiegand,et al.  47. Die Einteilung der deutschen Dialekte , 1983 .

[22]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[23]  V. M. Zhirmunskiĭ Deutsche Mundartkunde : Vergleichende Laut- und Formenlehre der deutschen Mundarten , 1962 .

[24]  F. Maurer Untersuchungen über die deutsche Verbstellung in ihrer geschichtlichen Entwicklung , 1926 .

[25]  W. Tobler A Computer Movie Simulating Urban Growth in the Detroit Region , 1970 .