Phoneme-group specific octave-band weights in predicting speech intelligibility

In an earlier study we derived robust frequency-weighting functions for prediction of the intelligibility of short nonsense words. These frequency-weighting functions are applied for prediction of intelligibility such as with the speech transmission index (STI). Six independent experiments revealed essentially similar frequency-weighting functions for the prediction of the nonsense word scores with respect to signal-to-noise ratio and gender [Speech Communication 28 (1999) 109]. Although the frequency weightings do not vary significantly for signal-to-noise ratio or gender, other studies have shown that using different types of speech material (i.e., nonsense words, phonetically balanced words and connected discourse) resulted in quite different frequency-weighting functions. This may be related to the distribution of specific phonemes in the test material. In order to obtain a more generic description of the frequency weighting, four relevant groups of phonemes were identified. In situations with reduced intelligibility, a small confusion rate of the phonemes between the groups and a high confusion rate of the phonemes within each group was observed. For each group a specific frequency-weighting function and a good prediction of the phoneme group scores could be obtained. It was shown that from these (weighted) phoneme group scores, word scores could be predicted with a prediction accuracy of ca. 4% (this corresponds to a signal-to-noise ratio of about 1 dB). Hence, this method provides a more generic way to predict intelligibility scores for different types of speech material.

[1]  Herman J. M. Steeneken,et al.  Mutual dependence of the octave-band weights in predicting speech intelligibility , 1999, Speech Commun..

[2]  T Houtgast,et al.  A physical method for measuring speech-transmission quality. , 1980, The Journal of the Acoustical Society of America.

[3]  C V Pavlovic,et al.  An evaluation of some assumptions underlying the articulation index. , 1984, The Journal of the Acoustical Society of America.

[4]  G. A. Miller,et al.  An Analysis of Perceptual Confusions Among Some English Consonants , 1955 .

[5]  C V Pavlovic,et al.  A frequency importance function for continuous discourse. , 1987, The Journal of the Acoustical Society of America.

[6]  Herman J. M. Steeneken,et al.  Validation of the revised STIr method , 2002, Speech Commun..

[7]  Herman J. M. Steeneken,et al.  Objective and diagnostic assessment of (isolated) word recognizers , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[8]  T. Houtgast,et al.  A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria , 1985 .

[9]  H.J.M. Steeneken,et al.  On measuring and predicting speech intelligibility , 1992 .

[10]  J. C. Steinberg,et al.  Factors Governing the Intelligibility of Speech Sounds , 1945 .

[11]  A. Bronkhorst,et al.  A model for context effects in speech recognition. , 1993, The Journal of the Acoustical Society of America.

[12]  G. A. Miller,et al.  Erratum: An Analysis of Perceptual Confusions Among Some English Consonants [J. Acoust. Soc. Am. 27, 339 (1955)] , 1955 .

[13]  C V Pavlovic,et al.  Frequency importance functions for a feature recognition test material. , 1988, The Journal of the Acoustical Society of America.

[14]  C V Pavlovic,et al.  Derivation of primary parameters and procedures for use in speech intelligibility predictions. , 1987, The Journal of the Acoustical Society of America.

[15]  John C. Wells Computer-coded phonetic transcription , 1987 .