Automatically measuring the strength of foreign accents in English

We measure the differences between the pronunciations of native and non-native American English speakers using a modified version of the Levenshtein (or string edit) distance applied to phonetic transcriptions. Although this measure is well understood theoretically and variants of it have been used successfully to study dialect pronunciations, the comprehensibility of related varieties, and the atypicalness of the speech of the bearers of cochlear implants, it has not been applied to study foreign accents. We briefly present an appropriate version of the Levenshtein distance in this paper and apply it to compare the pronunciation of non-native English speakers to native American English speech. We show that the computational measurements correlate strongly with the average “native-like” judgments given by more than 1000 native U.S. English raters (r = -0.8, p < 0.001). This means that the Levenshtein distance is qualified to function as a measurement of “native-likeness” in studies of foreign accent.

[1]  L. Cronbach Coefficient alpha and the internal structure of tests , 1951 .

[2]  J. Nerbonne,et al.  Inducing a measure of phonetic similarity from dialect variation , 2011 .

[3]  Nathan C. Sanders,et al.  Phonological Distance Measures* , 2009, J. Quant. Linguistics.

[4]  Johann-Mattis Multiple sequence alignment in historical linguistics A sound class based approach , 2012 .

[5]  John Nerbonne,et al.  Evaluating the Pairwise String Alignment of Pronunciations , 2009, LaTeCH - SHELT&R@EACL.

[6]  Steven H. Weinberger,et al.  The Speech Accent Archive: towards a typology of English accents , 2011 .

[7]  Warren Maguire,et al.  The sound patterns of Englishes: representing phonetic similarity , 2007, English Language and Linguistics.

[8]  Martijn Wieling,et al.  A quantitative approach to social and geographical dialect variation , 2012 .

[9]  Wilbert Heeringa,et al.  Measuring Dialect Differences , 2009 .

[10]  John Nerbonne Review of April McMahon & Robert McMahon. Language Classification by the Numbers. Oxford: Oxford University Press. 2005, xvii + 265 pp. , 2007 .

[11]  W. Heeringa,et al.  Perceptive evaluation of Levenshtein dialect distance measurements using Norwegian dialect data , 2004, Language Variation and Change.

[12]  R. Harald Baayen,et al.  Corpus-based studies in language use, language learning, and language documentation , 2011 .

[13]  J. Kruskal An Overview of Sequence Comparison: Time Warps, String Edits, and Macromolecules , 1983 .

[14]  John Laver,et al.  Principles of Phonetics: Principles of transcription , 1994 .

[15]  Grzegorz Kondrak,et al.  Identification of Confusable Drug Names: A New Approach and Evaluation Methodology , 2004, COLING.

[16]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[17]  John Nerbonne,et al.  Multiple Sequence Alignments in Linguistics , 2009, LaTeCH - SHELT&R@EACL.

[18]  James Emil Flege,et al.  Factors affecting degree of foreign accent in an L2: a review , 2001, J. Phonetics.

[19]  John Nerbonne,et al.  Measuring Dialect Distance Phonetically , 1997, SIGMORPHON@EACL.

[20]  Wilbert Heeringa,et al.  Phonetic and Lexical Predictors of Intelligibility , 2008, Int. J. Humanit. Arts Comput..

[21]  Brett Kessler,et al.  Computational dialectology in Irish Gaelic , 1995, EACL.