The Computational Analysis of Bulgarian Dialect Pronunciation

The paper presents a computational analysis of Bulgarian dialect variation, concentrating on pronunciation differences. It describes the phonetic data set compiled during the project* ‘Measuring Linguistic Unity and Diversity in Europe’ that consists of the pronunciations of 157 words collected at 197 sites from all over Bulgaria. We also present the results of analyzing this data set using various quantitative methods and compare them to the traditional scholarship on Bulgarian dialects. The results have shown that various dialectometrical techniques clearly identify east-west division of the country along the ‘jat’ border, as well as the third group of varieties in the Rodopi area. The rest of the groups specified in the traditional atlases either were not confirmed or were confirmed with a low confidence.

[1]  John Nerbonne,et al.  Data-driven Dialectology , 2008 .

[2]  Paul Black,et al.  Multidimensional Scaling Applied to Linguistic Relationships. , 1973 .

[3]  John Nerbonne,et al.  Projecting Dialect Distances to Geography: Bootstrap Clustering vs. Noisy Clustering , 2007, GfKl.

[4]  Hans Goebl,et al.  Dialektometrie: Prinzipien und Methoden des Einsatzes der Numerischen Taxonomie im Bereich der Dialektgeographie , 1984 .

[5]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[6]  Wilbert Jan Heeringa Measuring dialect pronunciation differences using Levenshtein distance , 2004 .

[7]  John Nerbonne,et al.  Computing and Language Variation: Recognising Groups among Dialects , 2009 .

[8]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[9]  William A. Kretzschmar,et al.  Progress in Dialectometry: Toward Explanation , 2006, Lit. Linguistic Comput..

[10]  Hans Goebl,et al.  Dialektometrische Studien. anhand italoromanischer, rätoromanischer und galloromanischer Sprachmaterialien aus AIS und ALF. 3 Bände , 1984 .

[11]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[12]  Brett Kessler,et al.  Computational dialectology in Irish Gaelic , 1995, EACL.

[13]  J. Dufrénoy La relation entre la distance spatiale et la distance lexicale , 1972 .

[14]  Lars Schmidt-Thieme,et al.  Data Analysis, Machine Learning and Applications - Proceedings of the 31st Annual Conference of the Gesellschaft für Klassifikation e.V., Albert-Ludwigs-Universität Freiburg, March 7-9, 2007 , 2008, GfKl.