The contribution of obstruent consonants and acoustic landmarks to speech recognition in noise.

The obstruent consonants (e.g., stops) are more susceptible to noise than vowels, raising the question whether the degradation of speech intelligibility in noise can be attributed, at least partially, to the loss of information carried by obstruent consonants. Experiment 1 assesses the contribution of obstruent consonants to speech recognition in noise by presenting sentences containing clean obstruent consonants but noise-corrupted voiced sounds (e.g., vowels). Results indicated substantial (threefold) improvement in speech recognition, particularly at low signal-to-noise ratio levels (-5 dB). Experiment 2 assessed the importance of providing partial information, within a frequency region, of the obstruent-consonant spectra while leaving the remaining spectral region unaltered (i.e., noise corrupted). Access to the low-frequency (0-1000 Hz) region of the clean obstruent-consonant spectra was found to be sufficient to realize significant improvements in performance and that was attributed to improvement in transmission of voicing information. The outcomes from the two experiments suggest that much of the improvement in performance must be due to the enhanced access to acoustic landmarks, evident in spectral discontinuities signaling the onsets of obstruent consonants. These landmarks, often blurred in noisy conditions, are critically important for understanding speech in noise for better determination of the syllable structure and word boundaries.

[1]  Michael J Owren,et al.  The relative roles of vowels and consonants in discriminating talker identity versus word meaning. , 2006, The Journal of the Acoustical Society of America.

[2]  Anne Cutler,et al.  The predominance of strong initial syllables in the English vocabulary , 1987 .

[3]  IEEE Recommended Practice for Speech Quality Measurements , 1969, IEEE Transactions on Audio and Electroacoustics.

[4]  D Byrne,et al.  Speech recognition of hearing-impaired listeners: predictions from audibility and the limited role of high-frequency amplification. , 1998, The Journal of the Acoustical Society of America.

[5]  Dennis H. Klatt,et al.  Speech perception: a model of acoustic–phonetic analysis and lexical access , 1979 .

[6]  R. M. Warren,et al.  Multiple phonemic restorations follow the rules for auditory induction , 1987, Perception & psychophysics.

[7]  Jae Hee Lee,et al.  Contribution of consonant versus vowel information to sentence intelligibility for young normal-hearing and elderly hearing-impaired listeners. , 2007, The Journal of the Acoustical Society of America.

[8]  R. Plomp,et al.  Effects of fluctuating noise and interfering speech on the speech-reception threshold for impaired and normal hearing. , 1990, The Journal of the Acoustical Society of America.

[9]  P. Loizou,et al.  The influence of noise on vowel and consonant cues. , 2005, The Journal of the Acoustical Society of America.

[10]  David W. Gow,et al.  How word onsets drive lexical access and segmentation: evidence from acoustics, phonology and processing , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[11]  P. Mermelstein Automatic segmentation of speech into syllabic units. , 1975, The Journal of the Acoustical Society of America.

[12]  Peter F. Assmann,et al.  The Perception of Speech Under Adverse Conditions , 2004 .

[13]  M. Ball,et al.  Clinical Phonetics , 2008, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[14]  G. A. Miller,et al.  An Analysis of Perceptual Confusions Among Some English Consonants , 1955 .

[15]  C. Turner,et al.  High-frequency audibility: benefits for hearing-impaired listeners. , 1998, The Journal of the Acoustical Society of America.

[16]  B. Delgutte,et al.  Speech coding in the auditory nerve: IV. Sounds with consonant-like dynamic characteristics. , 1984, The Journal of the Acoustical Society of America.

[17]  B Kollmeier,et al.  Speech intelligibility prediction in hearing-impaired listeners based on a psychoacoustically motivated perception model. , 1996, The Journal of the Acoustical Society of America.

[18]  Carol Espy-Wilson,et al.  A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition. , 2008, The Journal of the Acoustical Society of America.

[19]  R V Shannon,et al.  Consonant recordings for speech testing. , 1999, The Journal of the Acoustical Society of America.

[20]  Raymond L. Goldsworthy,et al.  Analysis of speech-based Speech Transmission Index methods with implications for nonlinear operations. , 2004, The Journal of the Acoustical Society of America.

[21]  Anne Cutler,et al.  The role of strong syllables in segmentation for lexical access , 1988 .

[22]  M A Mines,et al.  Frequency of Occurrence of Phonemes in Conversational English , 1978, Language and speech.

[23]  Kenneth N Stevens,et al.  Toward a model for lexical access based on acoustic landmarks and distinctive features. , 2002, The Journal of the Acoustical Society of America.

[24]  S. Blumstein,et al.  Acoustic invariance in speech production: evidence from measurements of the spectral characteristics of stop consonants. , 1979, The Journal of the Acoustical Society of America.

[25]  T Houtgast,et al.  A physical method for measuring speech-transmission quality. , 1980, The Journal of the Acoustical Society of America.

[26]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[27]  P. Stelmachowicz,et al.  Effect of stimulus bandwidth on the perception of /s/ in normal- and hearing-impaired children and adults. , 2001, The Journal of the Acoustical Society of America.

[28]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[29]  Sharlene A. Liu,et al.  Landmark detection for distinctive feature-based speech recognition , 1996 .

[30]  J. Sawusch,et al.  Some Stages of Processing in Speech Perception , 1975 .

[31]  Jont B. Allen,et al.  Consonant and vowel confusions in speech-weighted noise , 2007, INTERSPEECH.

[32]  A. Liberman,et al.  The role of selected stimulus-variables in the perception of the unvoiced stop consonants. , 1952, The American journal of psychology.

[33]  Yonghong Yan,et al.  The contribution of consonants versus vowels to word recognition in fluent speech , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[34]  R L Smith,et al.  Adaptation, saturation, and physiological masking in single auditory-nerve fibers. , 1979, The Journal of the Acoustical Society of America.

[35]  Stephanie Seneff,et al.  Transcription and Alignment of the TIMIT Database , 1996 .

[36]  F. Zeng,et al.  Recognition of voiceless fricatives by normal and hearing-impaired subjects. , 1990, Journal of speech and hearing research.