Improved acoustic modeling for transcribing Arabic broadcast data

ABSTRACT This paper summarizes our recent progress in improving theautomatic transcription of Arabic broadcast audio data, andsome efforts to address the challenges of the broadcast con-versational speech. Our efforts are aimed at improving theacoustic, pronunciation and language models taking into ac-count specificities of the Arabic language. In previous work wedemonstrated that explicit modeling of short vowels improvedrecognition performance, even when producing non-vocalizedhypotheses. In addition to modeling short vowels, consonantgemination and nunation are now explicitly modeled, alterna-tive pronunciations have been introduced to better represent di-alectical variants, and a duration model has been integrated.In order to facilitate training on Arabic audio data with non-vocalized transcripts a generic vowel model has been intro-duced. Compared with the previous system (used in the 2006GALE evaluation) the relative word error rate has been reducedby over 10%. Index Terms – Speech recognition, Arabic, broadcast news,broadcast conversations

[1]  J. Xu,et al.  Audio Indexing of Arabic broadcast news , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Roger K. Moore Computer Speech and Language , 1986 .

[3]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[4]  Jean-Luc Gauvain,et al.  Modeling Duration via Lattice Rescoring , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[5]  Jean-Luc Gauvain,et al.  Lightly supervised and unsupervised acoustic model training , 2002, Comput. Speech Lang..

[6]  Jean-Luc Gauvain,et al.  Training Neural Network Language Models on Very Large Corpora , 2005, HLT.

[7]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[8]  Sherif Abdou,et al.  Recent progress in Arabic broadcast news transcription at BBN , 2005, INTERSPEECH.

[9]  Jean-Luc Gauvain,et al.  Arabic Broadcast News Transcription Using a One Million Word Vocalized Vocabulary , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[10]  Jean-Luc Gauvain,et al.  Modeling vowels for Arabic BN transcription , 2005, INTERSPEECH.

[11]  Venkata Ramana Rao Gadde Modeling word durations , 2000, INTERSPEECH.

[12]  Andreas Stolcke,et al.  Finding consensus among words: lattice-based word error minimization , 1999, EUROSPEECH.