Romanian Spoken Language Resources and Annotation for Speaker Independent Spontaneous Speech Recognition

This paper presents studies and early results with the scope to build a robust spontaneous speech recognition system in Romanian language. We have tried to give solutions to several issues that have arisen like building a large and accurate database within a reasonable time. A short description of the database is given and some statistics are collected in order to show its evolution in several stages of the project. Embedded training technique has been used for training triphones. As a consequence, the alignment problem has been studied and a solution is proposed for it. The final purpose of these attempts is to obtain substantial results in speech recognition for Romanian language that can be used as baseline for further results.