MBR-PSOLA: Text-To-Speech synthesis based on an MBE re-synthesis of the segments database

Abstract The use of the Time-Domain Pitch Synchronous OverLap-Add (TD-PSOLA) algorithm in a Text-To-Speech synthesizer is reviewed. Its drawbacks are underlined and three conditions on the speech database are examined. In order to satisfy them, a previously described high quality resynthesis process is developed and enhanced, which makes use of the well-known Multi-Band Excited (MBE) model. An important by-product of this operation is that optimal Pitch Marking turns out to be automatic. A temporal interpolation block is finally added. The resulting Multi-Band Resynthesis Pitch Synchronous OverLap Add (MBR-PSOLA) synthesis algorithm supports spectral interpolation between voiced parts of segments, with virtually no increase in complexity. It provides the basis of a high-quality Text-To-Speech (TTS) synthesizer.

[1]  Jae S. Lim,et al.  Multiband excitation vocoder , 1988, IEEE Transactions on Acoustics, Speech, and Signal Processing.

[2]  Francis Charpentier,et al.  Diphone synthesis using an overlap-add technique for speech waveforms concatenation , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Renaud J. Di Francesco,et al.  Detection of the glottal closure by jumps in the statistical properties of the signal , 1989, EUROSPEECH.

[4]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[5]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[6]  Alf Isaksson,et al.  Inverse glottal filtering using a parameterized input model , 1989 .