Removing phase mismatches in concatenative speech synthesis

Concatenation of acoustic units is widely used in most of the currently available text-to-speech systems. While this approach leads to higher intelligibility and naturalness than synthesis-by-rule, it has to cope with the issues of concatenating acoustic units that have been recorded in a di erent order. One important issue in concatenation is that of synchronization of speech frames or, in other words, inter-frame coherence. This paper presents a novel method for synchronization of signals with applications to speech synthesis. The method is based on the notion of center of gravity applied to speech signals. It is an o -line approach as this can be done during analysis with no computational burden on synthesis. The method has been tested with the Harmonic plus Noise Model, HNM, on many large speech databases. The resulting synthetic speech is free of phase mismatch (inter-frame incoherence) problems.