Algorithmic Exploration of American English Dialects

In this paper, we use a novel algorithmic approach to explore dialectal variation in American English speech. Without the need for human phonemic annotations, we are able to use an existing corpus transcribed in text form only. Our results show that, in general, American English dialects can be divided into two larger groups: dialects of the South (Texas to North Carolina except for peninsular Florida), and the rest of the country. Our results confirm some well-known results from dialectology, such as the pin-pen merger, but show that some other ones, such as the cot-caught merger, may be losing their isogloss boundaries. Moreover, we demonstrate that our algorithm can extend to dialectal features in other languages.

[1]  Gregory R. Guy Variation and change in Latin American Spanish and Portuguese , 2014 .

[2]  William Labov,et al.  The atlas of North American English : phonetics, phonology and sound change : a multimedia reference tool , 2006 .

[3]  Lori Lamel,et al.  Studying Vowel Variation in French-Algerian Arabic Code-switched Speech , 2018, INTERSPEECH.

[4]  Adrian Leemann,et al.  Regional Variation of /r/ in Swiss German Dialects , 2018, INTERSPEECH.

[5]  Tara N. Sainath,et al.  Multi-Dialect Speech Recognition with a Single Sequence-to-Sequence Model , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Jenna Burrell,et al.  How the machine ‘thinks’: Understanding opacity in machine learning algorithms , 2016 .

[7]  John M. Lipski Geographical and social varieties of spanish: an overview , 2012 .

[8]  Mark Liberman,et al.  Automatic detection of “g-dropping” in American English using forced alignment , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[9]  W. Stewart A SOCIOLINGUISTIC TYPOLOGY FOR DESCRIBING NATIONAL MULTILINGUALISM , 1968 .

[10]  Jeffrey E. F. Friedl Mastering Regular Expressions , 1997 .

[11]  Barbara A Lafford,et al.  Latin American Spanish , 1996 .

[12]  Erik R. Thomas,et al.  Phonological and Phonetic Characteristics of African American Vernacular English , 2007, Lang. Linguistics Compass.

[13]  J. Gumperz Linguistic and Social Interaction in Two Communities1 , 1964 .

[14]  The Sneakers/Tennis Shoes Boundary , 1986 .

[15]  Lori Lamel,et al.  Exploring Temporal Reduction in Dialectal Spanish: A Large-scale Study of Lenition of Voiced Stops and Coda-s , 2018, INTERSPEECH.

[16]  Antoine Bruguier,et al.  Pronunciation Learning with RNN-Transducers , 2017, INTERSPEECH.