FFT-based spectro-temporal analysis and synthesis of sounds

The concept of the two-dimensional spectro-temporal modulation filtering of the auditory model [1] is implemented for the FFT spectrogram. It analyzes the spectrogram in terms of the temporal dynamics and the spectral structures of the sound. The overlap and add (OLA) method, which is more convenient and reliable than the iterative-projection method proposed in [1], is used to invert the FFT spectrogram back to sounds. The Non-Negative Sparse Coding (NNSC) method is adopted to demonstrate the benefit of our analysis-synthesis procedures in a noise suppression application. Even without fine-tuning parameters, our proposed analysis-synthesis procedures offer benefits in de-noising especially under low SNR conditions.

[1]  Tamar Frankel [The theory and the practice...]. , 2001, Tijdschrift voor diergeneeskunde.

[2]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[3]  J. Larsen,et al.  Wind Noise Reduction using Non-Negative Sparse Coding , 2007, 2007 IEEE Workshop on Machine Learning for Signal Processing.

[4]  Mounya Elhilali,et al.  A spectro-temporal modulation index (STMI) for assessment of speech intelligibility , 2003, Speech Commun..

[5]  Tony Ezzat,et al.  Spectro-temporal analysis of speech using 2-d Gabor filters , 2007, INTERSPEECH.

[6]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[7]  Nima Mesgarani,et al.  Denoising in the Domain of Spectrotemporal Modulations , 2007, EURASIP J. Audio Speech Music. Process..

[8]  Thomas F. Quatieri,et al.  2-d Processing of Speech with Application to Pitch Estimation , 2002, INTERSPEECH.

[9]  Pascal Scalart,et al.  Speech enhancement based on a priori signal to noise estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[10]  Birger Kollmeier,et al.  Optimization and evaluation of Gabor feature sets for ASR , 2008, INTERSPEECH.

[11]  Powen Ru,et al.  Multiresolution spectrotemporal analysis of complex sounds. , 2005, The Journal of the Acoustical Society of America.

[12]  David Gelbart,et al.  Improving word accuracy with Gabor feature extraction , 2002, INTERSPEECH.

[13]  Methods for objective and subjective assessment of quality Perceptual evaluation of speech quality ( PESQ ) : An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs , 2002 .