Automatic analysis of slips of the tongue: Insights into the cognitive architecture of speech production

Traces of the cognitive mechanisms underlying speaking can be found within subtle variations in how we pronounce sounds. While speech errors have traditionally been seen as categorical substitutions of one sound for another, acoustic/articulatory analyses show they partially reflect the intended sound. When "pig" is mispronounced as "big," the resulting /b/ sound differs from correct productions of "big," moving towards intended "pig"-revealing the role of graded sound representations in speech production. Investigating the origins of such phenomena requires detailed estimation of speech sound distributions; this has been hampered by reliance on subjective, labor-intensive manual annotation. Computational methods can address these issues by providing for objective, automatic measurements. We develop a novel high-precision computational approach, based on a set of machine learning algorithms, for measurement of elicited speech. The algorithms are trained on existing manually labeled data to detect and locate linguistically relevant acoustic properties with high accuracy. Our approach is robust, is designed to handle mis-productions, and overall matches the performance of expert coders. It allows us to analyze a very large dataset of speech errors (containing far more errors than the total in the existing literature), illuminating properties of speech sound distributions previously impossible to reliably observe. We argue that this provides novel evidence that two sources both contribute to deviations in speech errors: planning processes specifying the targets of articulation and articulatory processes specifying the motor movements that execute this plan. These findings illustrate how a much richer picture of speech provides an opportunity to gain novel insights into language processing.

[1]  Mark Liberman,et al.  F0 declination in English and Mandarin Broadcast News Speech , 2014, Speech Commun..

[2]  Stefanie Shattuck-Hufnagel,et al.  The Limited Use of Distinctive Features and Markedness in Speech Production: Evidence from Speech Error Data. , 1979 .

[3]  R. J. Weber,et al.  The parameter remapping effect in human performance: Evidence from tongue twisters and finger fumblers☆ , 1986 .

[4]  A G Ramakrishnan,et al.  Estimation of voice-onset time in continuous speech using temporal measures. , 2014, The Journal of the Acoustical Society of America.

[5]  Tamir Hazan,et al.  Direct Loss Minimization for Structured Prediction , 2010, NIPS.

[6]  Keith Johnson,et al.  Why reduce? Phonological neighborhood density and phonetic reduction in spontaneous speech , 2012 .

[7]  Marianne Pouplier,et al.  Intention in articulation: Articulatory timing in alternating consonant sequences and its implications for models of speech production , 2010, Language and cognitive processes.

[8]  Maurizio Omologo,et al.  Automatic segmentation and labeling of speech based on Hidden Markov Models , 1993, Speech Commun..

[9]  G. Dell,et al.  Inhibition in interactive activation models of linguistic selection and sequencing. , 1994 .

[10]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[11]  Matthew Goldrick,et al.  Optimization and Quantization in Gradient Symbol Systems: A Framework for Integrating the Continuous and the Discrete in Cognition , 2014, Cogn. Sci..

[12]  L Saltzman Elliot,et al.  A Dynamical Approach to Gestural Patterning in Speech Production , 1989 .

[13]  Morgan Sonderegger,et al.  Automatic measurement of voice onset time using discriminative structured prediction. , 2012, The Journal of the Acoustical Society of America.

[14]  Maria-Barbara Wesenick,et al.  Estimating the quality of phonetic transcriptions and segmentations of speech signals , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[15]  John-Paul Hosom,et al.  Speaker-independent phoneme alignment using transition-dependent states , 2009, Speech Commun..

[16]  Jennifer E. Arnold,et al.  A processing-centered look at the contribution of givenness to durational reduction , 2012 .

[17]  Taehong Cho,et al.  Variation and universals in VOT: evidence from 18 languages , 1999 .

[18]  Marianne Pouplier,et al.  The role of a coda consonant as error trigger in repetition tasks , 2008, J. Phonetics.

[19]  Dani Byrd,et al.  Dynamic action units slip in speech production errors , 2007, Cognition.

[20]  Jonathan G. Fiscus,et al.  DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .

[21]  B. Younger,et al.  Lexical and articulatory interactions in children's language production. , 2010, Developmental science.

[22]  John H. L. Hansen,et al.  Automatic voice onset time detection for unvoiced stops (/p/, /t/, /k/) with application to accent classification , 2010, Speech Commun..

[23]  Richard Wright,et al.  The phonetics of phonological speech errors: An acoustic analysis of slips of the tongue , 2002, J. Phonetics.

[24]  Matthew A Goldrick,et al.  Gradient co-activation and speech error articulation: comment on Pouplier and Goldstein (2010) , 2014 .

[25]  Carolyn E. Wilshire,et al.  The “Tongue Twister” Paradigm as a Technique for Studying Phonological Encoding , 1999 .

[26]  Albert Costa,et al.  Language Switching Makes Pronunciation Less Nativelike , 2014, Psychological science.

[27]  Marc Brysbaert,et al.  Activation of articulatory information in speech perception , 2009, Proceedings of the National Academy of Sciences.

[28]  Mark Liberman,et al.  Automating phonetic measurement: The case of voice onset time , 2013 .

[29]  S. Blumstein,et al.  Cascading activation from phonological planning to articulatory processes: Evidence from tongue twisters , 2006 .

[30]  T. Shallice,et al.  Deep Dyslexia: A Case Study of , 1993 .

[31]  Yoram Singer,et al.  A Large Margin Algorithm for Speech-to-Phoneme and Music-to-Score Alignment , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Harriet J. Fell,et al.  A Platform for Automated Acoustic Analysis for Assistive Technology , 2010, SLPAT@NAACL.

[33]  Victoria A. Fromkin,et al.  The Non-Anomalous Nature of Anomalous Utterances , 1971 .

[34]  W. Labov,et al.  One Hundred Years of Sound Change in Philadelphia: Linear Incrementation, Reversal, and Reanalysis , 2013 .

[35]  Martin Corley,et al.  Cascading influences on the production of speech: Evidence from articulation , 2010, Cognition.

[36]  Hugo Van hamme,et al.  Automatic voice onset time estimation from reassignment spectra , 2009, Speech Commun..

[37]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[38]  Duane G. Watson,et al.  Repetition is easy: Why repeated referents have reduced prominence , 2010, Memory & cognition.

[39]  Martin Corley,et al.  Articulatory evidence for feedback and competition in speech production , 2009 .

[40]  Andreas Daffertshofer,et al.  A model for phase transitions in human hand movements during multifrequency tapping , 1996 .

[41]  Mariapaola D'Imperio,et al.  Prosodic structure and tongue twister errors , 2010 .

[42]  G S Dell,et al.  A spreading-activation theory of retrieval in sentence production. , 1986, Psychological review.

[43]  Matthew Goldrick,et al.  Interaction and representational integration: Evidence from speech errors , 2011, Cognition.

[44]  Daniel J. Olson Bilingual language switching and selection at the phonetic level: Asymmetrical transfer in VOT production , 2013, J. Phonetics.

[45]  Bob McMurray,et al.  Cue Integration With Categories: Weighting Acoustic Cues in Speech Using Unsupervised Learning and Distributional Statistics , 2010, Cogn. Sci..

[46]  Marianne Pouplier,et al.  Tongue Kinematics during Utterances Elicited with the SLIP Technique , 2007, Language and speech.

[47]  Colleen Balukas,et al.  Spanish-English bilingual voice onset time in spontaneous code-switching , 2015 .

[48]  Heike Martensen,et al.  The lexical bias effect is modulated by context, but the standard monitoring account doesn’t fly: Related beply to Baars et al. (1975) ☆ , 2005 .

[49]  Louis Goldstein,et al.  The relationship between planning and execution is more than duration: response to Goldrick & Chu , 2014 .

[50]  L. Lisker,et al.  A Cross-Language Study of Voicing in Initial Stops: Acoustical Measurements , 1964 .