Minimum mean squared error based warped complex cepstrum analysis for statistical parametric speech synthesis

This paper presents an approach for complex cepstrum analysis based on the minimum mean squared error criterion, and describes its application to statistical parametric speech synthesis. The proposed method alleviates some of the issues associated with conventional complex cepstrum analysis, such as choice of the window, phase unwrapping, and the need for accurate pitch marks. Given initial estimates of warped complex cepstra and respective analysis instants, the method iteratively optimizes the complex cepstrum on a warped quefrency domain by minimizing the mean squared error between the natural and the reconstructed speech waveforms. When applied to statistical parametric speech synthesis, the optimized complex cepstrum results in better performance in terms of synthesized speech quality, specially for emotional databases, when compared with the complex cepstrum calculated through conventional methods. Copyright © 2013 ISCA.

[1]  Mark J. F. Gales,et al.  Complex cepstrum for statistical parametric speech synthesis , 2013, Speech Commun..

[2]  Sanjit K. Mitra,et al.  Warped discrete-Fourier transform: Theory and applications , 2001 .

[3]  Heiga Zen,et al.  Statistical Parametric Speech Synthesis , 2007, IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Mark J. F. Gales,et al.  Complex cepstrum as phase information in statistical parametric speech synthesis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Jr. T. Quatieri Minimum and mixed phase speech analysis-synthesis by adaptive homomorphic deconvolution , 1979 .

[6]  Mark J. F. Gales,et al.  Complex cepstrum analysis based on the minimum mean squared error , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Oliver Watts,et al.  The CSTR/EMIME HTS system for Blizzard Challenge 2010 , 2010 .

[8]  Thierry Dutoit,et al.  Causal-anticausal decomposition of speech using complex cepstrum for glottal source estimation , 2011, Speech Commun..

[9]  Werner Verhelst,et al.  A new model for the short-time complex cepstrum of voiced speech , 1986, IEEE Trans. Acoust. Speech Signal Process..

[10]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[11]  Sanjit K. Mitra,et al.  The Nonuniform Discrete Fourier Transform , 2001 .

[12]  Alan V. Oppenheim,et al.  Discrete representation of signals , 1972 .

[13]  Wai C. Chu,et al.  Speech Coding Algorithms , 2003 .