Automatic transcription and speech recognition of Romanian corpus RO-GRID

The results reported in this paper assess the ability of Hidden Markov Model (HMM) based method to generate accurate and reliable automatic phone-level transcriptions for a small vocabulary speech corpus such as RO-GRID. The system requires only orthographic transcription of the target corpus, and can be bootstrapped from models trained just on few amount of data in the transcribed corpus. For this purpose, an automatic time-aligned phone transcription toolbox has been developed and tested on the Romanian corpus and also validated on an English corpus. The quality of transcriptions is judged by evaluating the statistical parameters of the error between the automatic and manual transcription. The transcriptions generated from the most reliable system deviate from the average manual transcription by an average of 20 ms. The system is also able to convert the generated transcription from HTK format into PRAAT format for further manipulation of the speech signal.