On the inversion of Mel-frequency cepstral coefficients for speech enhancement applications

The use of Mel-frequency cepstral coefficients (MFCCs) is well established in the fields of speech processing, particularly for speaker modeling within a Gaussian mixture model (GMM) speaker recognition system. The use of GMMs for speech enhancement applications has only recently been proposed in the literature; the concept of direct inversion of the MFCCs, however, has not been studied. In this paper we present a means to invert MFCCs for use in speech enhancement applications. Results for cepstral inversion is evaluated on the TIMIT speech corpus using perceptual evaluation of speech quality (PESQ).

[1]  Li Deng,et al.  Estimating cepstrum of speech under the presence of noise using a joint prior of static and dynamic features , 2004, IEEE Transactions on Speech and Audio Processing.

[2]  Richard J. Povinelli,et al.  Time-domain isolated phoneme classification using reconstructed phase spaces , 2005, IEEE Transactions on Speech and Audio Processing.

[3]  Ali H. Sayed,et al.  Fundamentals Of Adaptive Filtering , 2003 .

[4]  Methods for objective and subjective assessment of quality Perceptual evaluation of speech quality ( PESQ ) : An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs , 2002 .

[5]  Richard J. Mammone,et al.  Speaker recognition - general classifier approaches and data fusion methods , 2002, Pattern Recognit..

[6]  Carl D. Meyer,et al.  Matrix Analysis and Applied Linear Algebra , 2000 .

[7]  Roland Auckenthaler,et al.  Improving a GMM speaker verification system by phonetic weighting , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[8]  Thippur V. Sreenivas,et al.  GMM based Bayesian approach to speech enhancement in signal / transform domain , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Geoffrey Zweig,et al.  Speech Recognition with Dynamic Bayesian Networks , 1998, AAAI/IAAI.

[10]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[11]  Athanasios Mouchtaris,et al.  A Spectral Conversion Approach to Single-Channel Speech Enhancement , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[13]  Jae Lim,et al.  Signal estimation from modified short-time Fourier transform , 1984 .

[14]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[15]  Jia Zeng,et al.  Type-2 fuzzy hidden Markov models and their application to speech recognition , 2006, IEEE Transactions on Fuzzy Systems.

[16]  Tomi Kinnunen,et al.  A New Segmentation Algorithm Combined with Transient Frames Power for Text Independent Speaker Verification , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[17]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[18]  Xu Shao,et al.  Clean speech reconstruction from noisy mel-frequency cepstral coefficients using a sinusoidal model , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[19]  Pavel Matejka,et al.  Hierarchical Structures of Neural Networks for Phoneme Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[20]  D. Reynolds Automatic Speaker Recognition Using Gaussian Mixture Speaker Models , 1995 .