Multipulse Sequences for Residual Signal Modeling

In source-filter models of speech production, the residual signal - what remains after passing the speech signal through the inverse filter - contains important information for the generation of naturally sounding re-synthesized speech. Typically, the voiced regions of residual signals are regarded as a mixture of glottal pulse and noise. This paper introduces a novel approach to represent the noise component of voiced regions of residual signals through autoregressive filtering of multipulse sequences. The positions and amplitudes of the non-zero samples of these multipulse signals are optimized through a closed-loop procedure. The method in question is applied to excitation modeling in statistical parametric synthesis. Experimental results indicate that the use of multipulse-based noise component construction eliminates the necessity of run-time ad hoc procedures such as high-pass filtering and time modulation, common on excitation models for statistical parametric synthesizers, with no loss of synthesized speech quality. Copyright © 2011 ISCA.

[1]  Eric Moulines,et al.  High-quality speech modification based on a harmonic + noise model , 1995, EUROSPEECH.

[2]  Yannis Stylianou,et al.  Improving the modeling of the noise part in the harmonic plus noise model of speech , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Heiga Zen,et al.  Statistical Parametric Speech Synthesis , 2007, IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  J. Liljencrants,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .

[5]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[6]  Xavier Rodet,et al.  A HMM-based speech synthesis system using a new glottal source and vocal-tract separation method , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[8]  Wai C. Chu,et al.  Speech Coding Algorithms , 2003 .

[9]  Heiga Zen,et al.  An excitation model for HMM-based speech synthesis based on residual modeling , 2007, SSW.

[10]  Heiga Zen,et al.  Statistical parametric speech synthesis with joint estimation of acoustic and excitation model parameters , 2010, SSW.

[11]  Tomoki Toda,et al.  Improved training of excitation for HMM-based parametric speech synthesis , 2010, INTERSPEECH.