Predicting human perception and ASR classification of word-final [t] by its acoustic sub-segmental properties

This paper presents a study on the acoustic sub-segmental properties of word-final /t/ in conversational standard Dutch and how these properties contribute to whether humans and an ASR system classify the /t/ as acoustically present or absent. In general, humans and the ASR system use the same cues (presence of a constriction, a burst, and alveolar friction), but the ASR system is also less sensitive to fine cues (weak bursts, smoothly starting friction) than human listeners and misled by the presence of glottal vibration. These data inform the further development of models of human and automatic speech processing. Index Terms: Sub-segmental Acoustic Properties, Automatic Transcription, Human speech Perception, Dutch

[1]  Lou Boves,et al.  Experiences from the Spoken Dutch Corpus Project , 2002, LREC.

[2]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[3]  Lou Boves,et al.  Acoustic reduction in conversational Dutch: A quantitative analysis based on automatically generated segmental transcriptions , 2011, J. Phonetics.

[4]  D. Norris Shortlist: a connectionist model of continuous speech recognition , 1994, Cognition.

[5]  Lou Boves,et al.  Pronunciation variation modelling in a model of human word recognition , 2002 .

[6]  Barbara Schuppler,et al.  Word-final [t]-deletion: an analysis on the segmental and sub-segmental level , 2009, INTERSPEECH.

[7]  T. Jaeger,et al.  Categorical Data Analysis: Away from ANOVAs (transformation or not) and towards Logit Mixed Models. , 2008, Journal of memory and language.

[8]  Roger K. Moore,et al.  Towards capturing fine phonetic variation in speech using articulatory features , 2007, Speech Commun..

[9]  Mirjam Wester,et al.  An elitist approach to articulatory-acoustic feature classification , 2001, INTERSPEECH.

[10]  Florian Schiel,et al.  Pronuncation modeling applied to automatic segmentation of spontaneous speech , 1997, EUROSPEECH.

[11]  Lou Boves,et al.  Analysis of acoustic reduction using spectral similarity measures. , 2009, The Journal of the Acoustical Society of America.

[12]  Hugo Quené,et al.  Coping with gradient forms of /t/-deletion and lexical ambiguity in spoken word recognition , 2007 .

[13]  Odette Scharenborg,et al.  Using durational cues in a computational model of spoken-word recognition , 2009, INTERSPEECH.

[14]  Cristophe Patrick Jan Van Bael Validation, Automatic Generation and Use of Broad Phonetic Transcriptions , 2002 .

[15]  M. Ernestus Voice Assimilation and Segment Reduction in Casual Dutch. A Corpus-based Study of the Phonology-phonetics Interface , 2000 .

[16]  Gernot A. Fink,et al.  Combining acoustic and articulatory feature information for robust speech recognition , 2002, Speech Commun..

[17]  Catia Cucchiarini,et al.  Validation and improvement of automatic phonetic transcriptions , 2002, INTERSPEECH.