Duration Modelling Using Neural Networks for Hindi TTS System Considering Position of Syllable in a Word

Abstract The main criterion in duration modeling is to model the duration pattern of the natural speech, considering various features that affect the pattern. Proper estimation of segmental durations plays a vital role in natural sounding text-to-speech (TTS) synthesis. The primary reason for choosing the syllable as a basic unit is that the Indian languages are syllable centered. This paper presents a novel text processing and a syllable based data driven modelling of segmental duration for Hindi, using feed forward neural networks. The effectiveness of the system is demonstrated by synthesizing natural sounding speech for Hindi, national language of India.