论文信息 - Multipulse Sequences for Residual Signal Modeling

Multipulse Sequences for Residual Signal Modeling

In source-filter models of speech production, the residual signal - what remains after passing the speech signal through the inverse filter - contains important information for the generation of naturally sounding re-synthesized speech. Typically, the voiced regions of residual signals are regarded as a mixture of glottal pulse and noise. This paper introduces a novel approach to represent the noise component of voiced regions of residual signals through autoregressive filtering of multipulse sequences. The positions and amplitudes of the non-zero samples of these multipulse signals are optimized through a closed-loop procedure. The method in question is applied to excitation modeling in statistical parametric synthesis. Experimental results indicate that the use of multipulse-based noise component construction eliminates the necessity of run-time ad hoc procedures such as high-pass filtering and time modulation, common on excitation models for statistical parametric synthesizers, with no loss of synthesized speech quality. Copyright © 2011 ISCA.

Heiga Zen | Mark J. F. Gales | Kate Knill | Ranniery Maia | Sabine Buchholz

[1] Eric Moulines,et al. High-quality speech modification based on a harmonic + noise model , 1995, EUROSPEECH.

[2] Yannis Stylianou,et al. Improving the modeling of the noise part in the harmonic plus noise model of speech , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4] J. Liljencrants,et al. Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .

[5] John H. L. Hansen,et al. Discrete-Time Processing of Speech Signals , 1993 .

[6] Xavier Rodet,et al. A HMM-based speech synthesis system using a new glottal source and vocal-tract separation method , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7] Eric Moulines,et al. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[8] Wai C. Chu,et al. Speech Coding Algorithms , 2003 .

[9] Heiga Zen,et al. An excitation model for HMM-based speech synthesis based on residual modeling , 2007, SSW.

[10] Heiga Zen,et al. Statistical parametric speech synthesis with joint estimation of acoustic and excitation model parameters , 2010, SSW.

[11] Tomoki Toda,et al. Improved training of excitation for HMM-based parametric speech synthesis , 2010, INTERSPEECH.