A mixed-excitation vocoder based on exact analysis of harmonic components

A new analysis-synthesis algorithm has been developed for high quality diphone speech synthesis, based on accurate measurement of the mixture of periodic and noise information in speech. Input speech is analysed pitch synchronously, using a refined pitch estimation by means of 'first-harmonic filtering'. Accurate pitch forms the basis for a Discrete Fourier Transform (OFT), providing exact amplitudes and phases of all harmonics. For each harmonic a 'factor of noisiness' is calculated from the phase derivatives between two successive refined pitch periods. In unvoiced speech the conventional amplitude spectrum is determined and for all harmonics the 'factor of noisiness' is set to maximum. In the synthesis part the phase of each harmonic is composed from an initial and a random value scaled by the 'factor of noisiness' as determined in the analysis. A technique of 'overlap' and 'add' of the inverse. Fourier transforms completes the synthesis. Our method improves speech synthesis quality audibly. Phases can be manipulated to achieve an optimum fit at diphone boundaries and large modifications in duration and pitch are possible without losing naturalness.