Speech Modeling Using the Complex Cepstrum

Conventional cepstral speech modeling is based on the minimum phase parametric speech production model with infinite impulse response. In that approach only the logarithmic magnitude frequency response of the corresponding speech frame is approximated. In this contribution the principle of the cepstral speech modeling using the complex cepstrum is described. The obtained mixed-phase vocal tract model with finite impulse response contains also the information about the phase properties of the modeled speech frame. This model approximates the speech signal with higher accuracy than the model based on the real cepstrum, the numerical complexity and the memory requirements are at least twice greater.

[1]  Heiga Zen,et al.  Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  Alan V. Oppenheim,et al.  Discrete-Time Signal Pro-cessing , 1989 .

[3]  Robert Vích,et al.  Z Transform Theory and Applications , 1987 .

[4]  Thierry Dutoit,et al.  Complex cepstrum-based decomposition of speech for glottal source estimation , 2009, INTERSPEECH.

[5]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[6]  Thierry Dutoit,et al.  Using a pitch-synchronous residual codebook for hybrid HMM/frame selection speech synthesis , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.