Most of the Indian-language Text-To-Speech (TTS) synthesis systems designed till date are based upon the concatenation of acoustic units. The prime challenge is the selection of proper units and their elegant concatenation. Due to the precincts of current automated techniques based on Hidden Markov Model (HMM) and Dynamic Time Warping (DTW), manual verification and labeling are often essential. Automatic placement of phoneme boundaries in a speech waveform using explicit statistical model for phoneme boundary is proposed in this paper. We are projecting the Harmonic plus Noise Model (HNM) in the first step and refine the boundary placement by searching for the best match in a region near the estimated boundary with predefined boundary model Technique like ESNOLA. This technique is applied for effective concatenation, which results in smooth output. Studies show that HNM is capable of synthesizing all vowels and diphones with good quality. This can remarkably reduce the size of the database. Further the pitch synchronous analysis and the Glottal Closure Instants (GCI) are accurately calculated. The quality of the synthesized speech improves if these units are obtained from the glottal signal rather than from processing the signal. The database has to be developed for VCV for all Indian languages as we have done for Oriya, one of the official languages of the Republic of India for our case study.
[1]
Thomas F. Quatieri,et al.
Speech analysis/Synthesis based on a sinusoidal representation
,
1986,
IEEE Trans. Acoust. Speech Signal Process..
[2]
Shrikanth S. Narayanan,et al.
Refined speech segmentation for concatenative speech synthesis
,
2002,
INTERSPEECH.
[3]
Prem C. Pandey,et al.
SPEECH SYNTHESIS IN INDIAN LANGUAGES
,
2002
.
[4]
Thierry Dutoit,et al.
High-quality speech synthesis for phonetic speech segmentation
,
1997,
EUROSPEECH.
[5]
Marc Schröder,et al.
Emotional speech synthesis: a review
,
2001,
INTERSPEECH.
[6]
Yannis Stylianou,et al.
Applying the harmonic plus noise model in concatenative speech synthesis
,
2001,
IEEE Trans. Speech Audio Process..