Phonetic and Prosodically Rich Transcribed speech corpus in Indian languages: Bengali and Odia

In this paper, we introduce a speech corpus in Indian languages namely Bengali and Odia, which provides phonetic and prosodic information. Phonetics and prosody are vital parameters in human speech perception, hence systematically studying them will help in performing various speech processing tasks. Motivated by this, we have developed Phonetic and Prosodically Rich Transcribed (PPRT) Speech corpus in Bengali and Oriya languages. In this speech corpus ten hours of read speech, five hours of conversation speech and five hours of extempore speech have been collected. The database has been transcribed using International Phonetic Alphabet (IPA) for representing all possible phoneme variations. Along with the phonetic transcription, prosodic information such as duration patterns of syllables, intonation patterns of phrases and break patterns within and across phrases are represented.