Removal of Spectral Discontinuity in Concatenated Speech Waveform

synthesis systems which involve concatenation of recorded speech units are currently very popular. These systems are known for producing high quality, natural- sounding speech as they generate speech by joining together waveforms of different speech units. This method of speech generation is quite practical. However the speech units that are being concatenated may have different spectra on either side of the concatenation points. Such mismatches are spectral in nature and give rise to spectral discontinuity in concatenated speech waveforms. The presence of such discontinuities can be very distracting to the listener and degrade the overall quality of output speech. This paper proposes a speech signal processing technique that deals with the problem of spectral discontinuity in the context of concatenated waveform synthesis. It involves the post-processing of the synthesized speech waveform in time domain. This technique is implemented on different single channel Punjabi wave audio files which were created by concatenating different Punjabi syllables. A listening test was conducted to evaluate the proposed technique, and it was observed that the spectral discontinuity is reduced to a large extent and the output speech sounds more natural with the reduction of audible noise.

[1]  Masaaki Honda,et al.  Human Speech Production Mechanisms , 2003 .

[2]  B. Kirkpatrick,et al.  Spectral Discontinuity in Concatenative Speech Synthesis - Perception, Join Costs and Feature Transformations , 2010 .

[3]  Mike Plumpe,et al.  Which is more important in a concatenative text to speech system - pitch, duration, or spectral discontinuity? , 1998, SSW.

[4]  Youcef Tabet,et al.  Speech synthesis techniques. A survey , 2011, International Workshop on Systems, Signal Processing and their Applications, WOSSPA.

[5]  John H. L. Hansen,et al.  A comparison of spectral smoothing methods for segment concatenation based speech synthesis , 2002, Speech Commun..

[6]  Ronald W. Schafer,et al.  Introduction to Digital Speech Processing , 2007, Found. Trends Signal Process..

[7]  J. Solomon Speech synthesis techniques , 1981, 1981 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[8]  Raymond N. J. Veldhuis,et al.  Reducing audible spectral discontinuities , 2001, IEEE Trans. Speech Audio Process..

[9]  Allam Mousa,et al.  Voice Conversion Using Pitch Shifting Algorithm by Time Stretching with PSOLA and Re-Sampling , 2010 .

[10]  Albertus Sybrand Visagie,et al.  Speech Generation in a Spoken Dialogue System , 2004 .

[11]  Ingmund Bjørkan Speech Generation and Modification in Concatenative Speech Synthesis , 2010 .

[12]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[13]  Michael W. Macon,et al.  Control of spectral dynamics in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..