Relative energy and intelligibility of transient speech components

It is generally recognized that consonants are more critical than vowels to speech intelligibility, but we suggest that important information is contained in transient speech components, rather than the quasi-steady-state components of both consonants and vowels. Fixed-frequency filters cannot uniquely separate transients from the more steady-state vowel formants and consonant hubs, even though the former are predominately low frequency and the latter, high frequency. To study the relative speech intelligibility of the transient versus steady-state components, we employed an algorithm based on time-frequency analysis to extract quasi-steady-state energy from the speech signal, leaving a residual signal of predominantly transient components. Psychometric functions were measured for speech recognition of processed and unprocessed monosyllabic words. The transient components were found to account for approximately 2% of the energy of the original speech, yet were nearly equally intelligible. As hypothesized, the quasi-steady-state components contained much greater energy while providing significantly less intelligibility.

[1]  Ramdas Kumaresan,et al.  On decomposing speech into modulated components , 2000, IEEE Trans. Speech Audio Process..

[2]  Bruno Torrésani,et al.  Hybrid representations for audiophonic signal encoding , 2002, Signal Process..

[3]  Cheung-Fat Chan,et al.  Phase and transient modeling for harmonic+noise speech coding , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[5]  R. Kumaresan,et al.  Model-based approach to envelope and positive instantaneous frequency estimation of signals with speech applications , 1999 .

[6]  Abeer Alwan,et al.  On the use of variable frame rate analysis in speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[7]  Christophe d'Alessandro,et al.  An iterative algorithm for decomposition of speech signals into periodic and aperiodic components , 1998, IEEE Trans. Speech Audio Process..

[8]  A. Liberman,et al.  Tempo of frequency change as a cue for distinguishing classes of speech sounds. , 1956, Journal of experimental psychology.