Role of modulation magnitude and phase spectrum towards speech intelligibility

In this paper our aim is to investigate the properties of the modulation domain and more specifically, to evaluate the relative contributions of the modulation magnitude and phase spectra towards speech intelligibility. For this purpose, we extend the traditional (acoustic domain) analysis-modification-synthesis framework to include modulation domain processing. We use this framework to construct stimuli that retain only selected spectral components, for the purpose of objective and subjective intelligibility tests. We conduct three experiments. In the first, we investigate the relative contributions to intelligibility of the modulation magnitude, modulation phase, and acoustic phase spectra. In the second experiment, the effect of modulation frame duration on intelligibility for processing of the modulation magnitude spectrum is investigated. In the third experiment, the effect of modulation frame duration on intelligibility for processing of the modulation phase spectrum is investigated. Results of these experiments show that both the modulation magnitude and phase spectra are important for speech intelligibility, and that significant improvement is gained by the inclusion of acoustic phase information. They also show that smaller modulation frame durations improve intelligibility when processing the modulation magnitude spectrum, while longer frame durations improve intelligibility when processing the modulation phase spectrum.

[1]  Günther Palm,et al.  Effects of phase on the perception of intervocalic stop consonants , 1997, Speech Commun..

[2]  Tiago H. Falk,et al.  Modulation Spectral Features for Robust Far-Field Speaker Identification , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Steven Greenberg,et al.  Speech intelligibility derived from exceedingly sparse spectral information , 1998, ICSLP.

[4]  Steven Greenberg,et al.  Robust speech recognition using the modulation spectrogram , 1998, Speech Commun..

[5]  Hynek Hermansky,et al.  DESIRED CHARACTERISTICS OF MODULATION SPECTRUM FOR ROBUST AUTOMATIC SPEECH RECOGNITION , 1998 .

[6]  M.R. Schroeder,et al.  Models of hearing , 1975, Proceedings of the IEEE.

[7]  Joseph Picone,et al.  Signal modeling techniques in speech recognition , 1993, Proc. IEEE.

[8]  Kuldip K. Paliwal,et al.  Importance of the Dynamic Range of an Analysis Windowfunction for Phase-Only and Magnitude-Only Reconstruction of Speech , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[9]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[10]  Juan Carlos,et al.  Review of "Discrete-Time Speech Signal Processing - Principles and Practice", by Thomas Quatieri, Prentice-Hall, 2001 , 2003 .

[11]  Jae S. Lim,et al.  Signal estimation from modified short-time Fourier transform , 1983, ICASSP.

[12]  Ted H. Applebaum,et al.  Subband or cepstral domain filtering for recognition of Lombard and channel-distorted speech , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Tiago H. Falk,et al.  A NON-INTRUSIVE QUALITY MEASURE OF DEREVERBERATED SPEECH , 2008 .

[14]  Kuldip K. Paliwal,et al.  Single-channel speech enhancement using spectral subtraction in the short-time modulation domain , 2010, Speech Commun..

[15]  Schuyler Quackenbush,et al.  Objective measures of speech quality , 1995 .

[16]  Les E. Atlas,et al.  A non-uniform modulation transform for audio coding with increased time resolution , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[17]  Doh-Suk Kim A cue for objective speech quality estimation in temporal envelope representations , 2004, IEEE Signal Processing Letters.

[18]  Qin Li,et al.  Homomorphic modulation spectra , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Steven Greenberg,et al.  The modulation spectrogram: in pursuit of an invariant representation of speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Raymond L. Goldsworthy,et al.  Analysis of speech-based Speech Transmission Index methods with implications for nonlinear operations. , 2004, The Journal of the Acoustical Society of America.

[21]  Kuldip K. Paliwal,et al.  Effect of compressing the dynamic range of the power spectrum in modulation filtering based speech enhancement , 2008, INTERSPEECH.

[22]  A. B.,et al.  SPEECH COMMUNICATION , 2001 .

[23]  Yi Hu,et al.  Subjective comparison and evaluation of speech enhancement algorithms , 2007, Speech Commun..

[24]  A.V. Oppenheim,et al.  The importance of phase in signals , 1980, Proceedings of the IEEE.

[25]  R. Plomp,et al.  Effect of reducing slow temporal modulations on speech reception. , 1994, The Journal of the Acoustical Society of America.

[26]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[27]  Hervé Bourlard,et al.  On factorizing spectral dynamics for robust speech recognition , 2003, INTERSPEECH.

[28]  W. Bastiaan Kleijn,et al.  Noise suppression based on extending a speech-dominated modulation band , 2007, INTERSPEECH.

[29]  Kuldip K. Paliwal,et al.  Effect of Analysis Window Duration on Speech Intelligibility , 2008, IEEE Signal Processing Letters.

[30]  Tiago H. Falk,et al.  A Non-Intrusive Quality and Intelligibility Measure of Reverberant and Dereverberated Speech , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  T Houtgast,et al.  A physical method for measuring speech-transmission quality. , 1980, The Journal of the Acoustical Society of America.

[32]  Tiago H. Falk,et al.  Automatic recognition of speech emotion using long-term spectro-temporal features , 2009, 2009 16th International Conference on Digital Signal Processing.

[33]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[34]  T. Houtgast,et al.  A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria , 1985 .

[35]  Kuldip K. Paliwal,et al.  Speech enhancement using a minimum mean-square error short-time spectral modulation magnitude estimator , 2012, Speech Commun..

[36]  Kuldip K. Paliwal,et al.  On the usefulness of STFT phase spectrum in human listening tests , 2005, Speech Commun..

[37]  Les E. Atlas,et al.  Modulation frequency and efficient audio coding , 2001, SPIE Optics + Photonics.

[38]  Doh-Suk Kim,et al.  ANIQUE: An Auditory Model for Single-Ended Speech Quality Estimation , 2005, IEEE Trans. Speech Audio Process..

[39]  Jae S. Lim,et al.  The unimportance of phase in speech enhancement , 1982 .

[40]  L D Braida,et al.  A method to determine the speech transmission index from speech waveforms. , 1999, The Journal of the Acoustical Society of America.