论文信息 - Multiband analysis and synthesis of spectro-temporal modulations of Fourier spectrogram.

Multiband analysis and synthesis of spectro-temporal modulations of Fourier spectrogram.

The two-dimensional spectro-temporal modulation filtering concept of the auditory model [T. Chi, P. Ru, and S. A. Shamma, J. Acoust. Soc. Am. 118(2), 887-906 (2005)] is implemented on the Fourier spectrogram. The Fourier magnitude spectrogram is analyzed in terms of its joint spectro-temporal modulations, which embed the temporal dynamics and spectral structures. Instead of iterative projection methods, the overlap-and-add method is adopted to invert modified Fourier spectrograms back to sounds. The proposed framework not only provides a similar spectro-temporal analytical process for sounds as the auditory model but also produces synthesized sounds with better quality in a timely manner, which makes proposed framework feasible to human speech recognition (HSR) applications as well.

Tai-Shih Chi | Chung-Chien Hsu

[1] Qin Li,et al. Homomorphic modulation spectra , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2] Thomas F. Quatieri,et al. 2-d Processing of Speech with Application to Pitch Estimation , 2002, INTERSPEECH.

[3] Antony William Rix,et al. Perceptual evaluation of speech quality (PESQ): The new ITU standard for end-to-end speech quality a , 2002 .

[4] Patrik O. Hoyer,et al. Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[5] R. Patterson,et al. Time-domain modeling of peripheral auditory processing: a modular architecture and a software platform. , 1995, The Journal of the Acoustical Society of America.

[6] Nima Mesgarani,et al. Denoising in the Domain of Spectrotemporal Modulations , 2007, EURASIP J. Audio Speech Music. Process..

[7] Tony Ezzat,et al. Spectro-temporal analysis of speech using 2-d Gabor filters , 2007, INTERSPEECH.

[8] Jae Lim,et al. Signal estimation from modified short-time Fourier transform , 1984 .

[9] Philipos C. Loizou,et al. Speech Enhancement: Theory and Practice , 2007 .

[10] Mounya Elhilali,et al. A spectro-temporal modulation index (STMI) for assessment of speech intelligibility , 2003, Speech Commun..

[11] Thomas F. Quatieri,et al. High-Pitch Formant Estimation by Exploiting Temporal Change of Pitch , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[12] Tai-Shih Chi,et al. FFT-based spectro-temporal analysis and synthesis of sounds , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13] Birger Kollmeier,et al. Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition , 2011, Speech Commun..

[14] Powen Ru,et al. Multiresolution spectrotemporal analysis of complex sounds. , 2005, The Journal of the Acoustical Society of America.