Multiband analysis and synthesis of spectro-temporal modulations of Fourier spectrogram.

The two-dimensional spectro-temporal modulation filtering concept of the auditory model [T. Chi, P. Ru, and S. A. Shamma, J. Acoust. Soc. Am. 118(2), 887-906 (2005)] is implemented on the Fourier spectrogram. The Fourier magnitude spectrogram is analyzed in terms of its joint spectro-temporal modulations, which embed the temporal dynamics and spectral structures. Instead of iterative projection methods, the overlap-and-add method is adopted to invert modified Fourier spectrograms back to sounds. The proposed framework not only provides a similar spectro-temporal analytical process for sounds as the auditory model but also produces synthesized sounds with better quality in a timely manner, which makes proposed framework feasible to human speech recognition (HSR) applications as well.

[1]  Qin Li,et al.  Homomorphic modulation spectra , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Thomas F. Quatieri,et al.  2-d Processing of Speech with Application to Pitch Estimation , 2002, INTERSPEECH.

[3]  Antony William Rix,et al.  Perceptual evaluation of speech quality (PESQ): The new ITU standard for end-to-end speech quality a , 2002 .

[4]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[5]  R. Patterson,et al.  Time-domain modeling of peripheral auditory processing: a modular architecture and a software platform. , 1995, The Journal of the Acoustical Society of America.

[6]  Nima Mesgarani,et al.  Denoising in the Domain of Spectrotemporal Modulations , 2007, EURASIP J. Audio Speech Music. Process..

[7]  Tony Ezzat,et al.  Spectro-temporal analysis of speech using 2-d Gabor filters , 2007, INTERSPEECH.

[8]  Jae Lim,et al.  Signal estimation from modified short-time Fourier transform , 1984 .

[9]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[10]  Mounya Elhilali,et al.  A spectro-temporal modulation index (STMI) for assessment of speech intelligibility , 2003, Speech Commun..

[11]  Thomas F. Quatieri,et al.  High-Pitch Formant Estimation by Exploiting Temporal Change of Pitch , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Tai-Shih Chi,et al.  FFT-based spectro-temporal analysis and synthesis of sounds , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Birger Kollmeier,et al.  Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition , 2011, Speech Commun..

[14]  Powen Ru,et al.  Multiresolution spectrotemporal analysis of complex sounds. , 2005, The Journal of the Acoustical Society of America.