Extending a North American English category learner to a non-standard variety: Categorizing vowels across speech styles in Glasgwegian English

Despite much research on the performance of distributional category learning models on Standard North American English (e.g., Feldman [2], deBoer and Kuhl [1], McMurray et al. [10], Vallabha et al. [16], and many others), statistical learning of vowel categories of other regional varieties remains vastly underaddressed in computational literature. This paper applies an unsupervised infinite mixture model (as developed in Feldman [2]) to vowels from a corpus of Glaswegian English sociolinguistic interviews. While originally developed for North American English vowels in carrier syllables devised by Hillenbrand et al. [4] to limit variation due to phonetic context, the distributional learner was also able to categorize vowels largely correctly across speech styles common to sociolinguistic interviews. This displays the ability of the distributional learner to operate relatively well on data with extensive overlap from running Glaswegian English speech, demonstrating that computational models of category acquisition can handle more complex inputs than minimal pair lists, and can be used with naturally-occurring speech from non-standard regional varieties.

[1]  William Labov,et al.  The atlas of North American English : phonetics, phonology and sound change : a multimedia reference tool , 2006 .

[2]  Richard N Aslin,et al.  Statistical learning of phonetic categories: insights from a computational approach. , 2009, Developmental science.

[3]  J. Hillenbrand,et al.  Acoustic characteristics of American English vowels. , 1994, The Journal of the Acoustical Society of America.

[4]  Sharon Goldwater,et al.  A role for the developing lexicon in phonetic category acquisition. , 2013, Psychological review.

[5]  K. Stevens,et al.  Linguistic experience alters phonetic perception in infants by 6 months of age. , 1992, Science.

[6]  Susan Fitt,et al.  Synthesis of regional English using a keyword lexicon , 1999, EUROSPEECH.

[7]  Elizabeth K. Johnson,et al.  Statistical learning of tone sequences by human infants and adults , 1999, Cognition.

[8]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[9]  James L. McClelland,et al.  Unsupervised learning of vowel categories from infant-directed speech , 2007, Proceedings of the National Academy of Sciences.

[10]  J. Scobbie,et al.  Acquisition of Scottish English Phonology: an overview , 2006 .

[11]  Bart de Boer,et al.  Investigating the role of infant-directed speech with a computer model , 2003 .

[12]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[13]  Peter Wittenburg,et al.  ELAN: a Professional Framework for Multimodality Research , 2006, LREC.

[14]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .