Progress in Transcription of Vietnamese Broadcast News

In this paper, we report on our research and progress in Vietnamese Broadcast News transcription, with an emphasis on efficient modeling for more accurate recognition. In the acoustic modeling area, this was achieved through a re-alignment process, which considers all pronunciations for each word and outputs the pronunciation that best matches the acoustic data. The effectiveness of acoustic adaptation is greatly increased through unsupervised clustering of test data. In language modeling, we explored the use of non-broadcast-news training data as well as the adaptation to topic. Experimental results showed significant improvements in which the achieved WAR measured on a 1h test set was 84.2%, which gained absolutely 5.4% improvement over the baseline result (Nguyen and Vu, 2006 and Huynh et al., 2005).

[1]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[2]  Renato De Mori,et al.  Spoken Dialogues with Computers , 1998 .

[3]  Victor Zue,et al.  Phonetic recognition for spoken document retrieval , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  Li Deng,et al.  Modeling context-dependent phonetic units in a continuous speech recognition system for Mandarin Chinese , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.