From categories to gradience: Auto-coding sociophonetic variation with random forests

The time-consuming nature of coding sociophonetic variables that are typically treated as categorical represents an impediment to addressing research questions around these variables that require large volumes of data. In this paper, we apply a machine learning method, random forest classification (Breiman, 2001), to automate coding (categorical prediction) of two English sociophonetic variables traditionally treated as categorical, non-prevocalic /r/ and word-medial intervocalic /t/, based on tokens’ acoustic signatures. We found good performance for binary classifiers of non-prevocalic /r/ (Absent versus Present) and medial /t/ (Voiced versus Voiceless), but not for medial /t/ with a six-way coding distinction (largely due to some codes being sparsely represented in the training data). This method also yields rankings of acoustic measures in terms of importance in classification. Beyond any individual measures, this method generates probabilistic predictions of variation (classifier probabilities) that represent a composite of the acoustic cues fed into the model. In a listening experiment, we found that not only did classifier probabilities significantly capture gradience in trained listeners’ perceptions of rhoticity, they better predicted listeners’ perceptions than individual acoustic measures. This method thus represents a new approach to reconciling the categorical and continuous dimensions of sociophonetic variation.

[1]  Sravana Reddy,et al.  A Web Application for Automated Dialect Analysis , 2015, HLT-NAACL.

[2]  Jennifer Hay,et al.  How Rhoticity Became /r/-sandhi , 2005 .

[3]  N. Nagy,et al.  Boston (r): Neighbo(r)s nea(r) and fa(r) , 2010, Language Variation and Change.

[4]  Lauren Hall-Lew,et al.  Perceptual coding reliability of (L)-vocalization in casual speech data , 2012 .

[5]  Natasha Warner,et al.  Phonetic variability of stops and flaps in spontaneous and careful speech. , 2011, The Journal of the Acoustical Society of America.

[6]  Marc Brysbaert,et al.  Power Analysis and Effect Size in Mixed Effects Models: A Tutorial , 2018, Journal of cognition.

[7]  F. E. Satterthwaite An approximate distribution of estimates of variance components. , 1946, Biometrics.

[8]  R. O’Brien,et al.  A Caution Regarding Rules of Thumb for Variance Inflation Factors , 2007 .

[9]  R. B. Irwin Consistency of judgments of articulatory productions. , 1970, Journal of speech and hearing research.

[10]  James M Scobbie,et al.  Derhoticisation in Scottish English: a sociophonetic journey , 2014 .

[11]  Leendert Plug,et al.  Lenition, fortition and the status of plosive affrication: the case of spontaneous RP English /t/* , 2012, Phonology.

[12]  Daiki Hashimoto Loanword phonology in New Zealand English : exemplar activation and message predictability. , 2019 .

[13]  D. A. Kenny,et al.  Experiments with More Than One Random Factor: Designs, Analytic Models, and Statistical Power , 2017, Annual review of psychology.

[14]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[15]  Robin Dodsworth,et al.  Urban rejection of the vernacular: The SVS undone , 2012, Language Variation and Change.

[16]  C. Hall,et al.  Corpus-Based Sociophonetic Approaches to Postvocalic R-Lessness in African American Language , 2019, American Speech.

[17]  Leonhard Held,et al.  Spatio-Temporal Analysis of Epidemic Phenomena Using the R Package surveillance , 2014, ArXiv.

[18]  Kirsty McDougall,et al.  The acoustic character of fricated /t/ in Australian English: A comparison with /s/ and /ʃ/ , 2009, Journal of the International Phonetic Association.

[19]  R. Purse THE ARTICULATORY REALITY OF CORONAL STOP 'DELETION' , 2019 .

[20]  Roel Smits,et al.  Acoustical and perceptual analysis of the voicing distinction in Dutch initial plosives: the role of prevoicing , 2004, J. Phonetics.

[21]  Lou Boves,et al.  Acoustic reduction in conversational Dutch: A quantitative analysis based on automatically generated segmental transcriptions , 2011, J. Phonetics.

[22]  Kara Becker /r/ and the construction of place identity on New York City's Lower East Side1 , 2009 .

[23]  Jalal Al-Tamimi Revisiting acoustic correlates of pharyngealization in Jordanian and Moroccan Arabic: Implications for formal representations , 2017 .

[24]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[25]  J. Stuart-Smith A SOCIOPHONETIC INVESTIGATION OF POSTVOCALIC /r/ IN GLASWEGIAN ADOLESCENTS , 2007 .

[26]  Bodo Winter,et al.  Phonetics and politeness: Perceiving Korean honorific and non-honorific speech through phonetic cues , 2014 .

[27]  Carmen Llamas,et al.  Fricated realisations of /t/ in Dublin and Middlesbrough English: an acoustic analysis of plosive frication and surface fricative contrasts1 , 2008, English Language and Linguistics.

[28]  Janet Holmes,et al.  New Zealand Flappers: An Analysis of T Voicing in New Zealand English , 1994 .

[29]  Andreas Ziegler,et al.  ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R , 2015, 1508.04409.

[30]  Mario Chica-Olmo,et al.  An assessment of the effectiveness of a random forest classifier for land-cover classification , 2012 .

[31]  R. Harald Baayen,et al.  Models, forests, and trees of York English: Was/were variation as a case study for statistical practice , 2012, Language Variation and Change.

[32]  Lynn Clark,et al.  Priming as a Motivating Factor in Sociophonetic Variation and Change , 2018, Top. Cogn. Sci..

[33]  Abby Walker,et al.  Football versus football: Effect of topic on /r/ realization in American and English sports fans , 2013, Language and speech.

[34]  Paul Foulkes,et al.  The evolution of medial /t/ over real and remembered time , 2016 .

[35]  D. A. Kenny,et al.  Statistical power and optimal design in experiments in which samples of participants respond to samples of stimuli. , 2014, Journal of experimental psychology. General.

[36]  W. Labov,et al.  One Hundred Years of Sound Change in Philadelphia: Linear Incrementation, Reversal, and Reanalysis , 2013 .

[37]  D. Barr,et al.  Random effects structure for confirmatory hypothesis testing: Keep it maximal. , 2013, Journal of memory and language.

[38]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[39]  J. Scobbie,et al.  The role of gesture delay in coda /r/ weakening: An articulatory, auditory and acoustic study. , 2018, The Journal of the Acoustical Society of America.

[40]  William D. Raymond,et al.  The Buckeye corpus of conversational speech: labeling conventions and a test of transcriber reliability , 2005, Speech Commun..

[41]  Suzanne Boyce,et al.  A magnetic resonance imaging-based articulatory and acoustic study of "retroflex" and "bunched" American English /r/. , 2008, The Journal of the Acoustical Society of America.

[42]  Davide Chicco,et al.  Ten quick tips for machine learning in computational biology , 2017, BioData Mining.

[43]  James Sneed German,et al.  Reassignment of consonant allophones in rapid dialect acquisition , 2013, J. Phonetics.

[44]  Satterthwaite Fe An approximate distribution of estimates of variance components. , 1946 .

[45]  Mark A. Pitt,et al.  The buckeye corpus of speech: updates and enhancements , 2007, INTERSPEECH.

[46]  Marianna Kennedy Variation in the Pronunciation of English by New Zealand School Children , 2006 .

[47]  Russell S. Kirby,et al.  The Atlas of North American English: Phonetics, Phonology and Sound Change. A Multimedia Reference Tool , 2007 .

[48]  Peter Wittenburg,et al.  Annotation by Category: ELAN and ISO DCR , 2008, LREC.

[49]  Morgan Sonderegger,et al.  Automatic measurement of voice onset time using discriminative structured prediction. , 2012, The Journal of the Acoustical Society of America.

[50]  A. Zeileis,et al.  Danger: High Power! – Exploring the Statistical Properties of a Test for Random Forest Variable Importance , 2008 .

[51]  D. Bates,et al.  Fitting Linear Mixed-Effects Models Using lme4 , 2014, 1406.5823.

[52]  Sanford Weisberg,et al.  An R Companion to Applied Regression , 2010 .

[53]  Victor Kuperman,et al.  The Random Forests statistical technique: An examination of its value for the study of reading , 2016, Scientific studies of reading : the official journal of the Society for the Scientific Study of Reading.

[54]  Jennifer Hay,et al.  LaBB-CAT: an Annotation Store , 2012, ALTA.

[55]  Jennifer Hay,et al.  "Kia ora. This is my earthquake story". Multiple applications of a sociolinguistic corpus , 2016 .

[56]  R. Fiasson Allophonic imitation within and across word positions , 2015 .

[57]  J. Hay,et al.  Hearing r-sandhi: The role of past experience , 2018 .

[58]  Damaris Zurell,et al.  Collinearity: a review of methods to deal with it and a simulation study evaluating their performance , 2013 .

[59]  Bodo Winter,et al.  What makes a word prominent? Predicting untrained German listeners' perceptual judgments , 2018, J. Phonetics.

[60]  Margaret Maclagan,et al.  /r/-sandhi in early 20th century New Zealand English , 2012 .

[61]  John C. Wells,et al.  Accents of English , 1982 .

[62]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[63]  R. Temple Where and what is (t, d)?: A case study in taking a step back in order to advance sociophonetics , 2014 .

[64]  J. Scobbie,et al.  A Socio-Articulatory Study of Scottish Rhoticity , 2014 .

[65]  Morgan Sonderegger,et al.  Mixed-effects design analysis for experimental phonetics , 2018, J. Phonetics.

[66]  Achim Zeileis,et al.  BMC Bioinformatics BioMed Central Methodology article Conditional variable importance for random forests , 2008 .

[67]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[68]  Barbara Schuppler,et al.  How linguistic and probabilistic properties of a word affect the realization of its final /t/: Studies at the phonemic and sub-phonemic level , 2012, J. Phonetics.

[69]  Shohreh Kasaei,et al.  Persian handwritten digit recognition by random forest and convolutional neural networks , 2015, 2015 9th Iranian Conference on Machine Vision and Image Processing (MVIP).

[70]  Morgan Sonderegger,et al.  Montreal Forced Aligner: Trainable Text-Speech Alignment Using Kaldi , 2017, INTERSPEECH.

[71]  Oliver Chiu-sing Choy,et al.  An efficient MFCC extraction method in speech recognition , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[72]  J. Hay,et al.  Perceptions of regional dialects in New Zealand , 2005 .

[73]  Anastasia K. Rieh American English Flapping: Evidence Against Paradigm Uniformity with Phonetic Features , 2003 .

[74]  Natheer Khasawneh,et al.  Automated sleep stage identification system based on time-frequency analysis of a single EEG channel and random forest classifier , 2012, Comput. Methods Programs Biomed..