Lexical and phonetic modeling for Arabic automatic speech recognition

In this paper, we describe the use of either words or morphemes as lexical modeling units and the use of either graphemes or phonemes as phoneticmodeling units for Arabic automatic speech recognition (ASR). We designed four Arabic ASR systems: two word-based systems and two morpheme-based systems. Experimental results using these four systems show that they have comparable state-of-the-art performance individually, but the more sophisticated morpheme-based system tends to be the best. However, they seem to complement each other quite well within the ROVER system combination framework to produce substantially-improved combined results.

[1]  Sherif Abdou,et al.  Recent progress in Arabic broadcast news transcription at BBN , 2005, INTERSPEECH.

[2]  Rabih Zbib,et al.  Improved morphological decomposition for Arabic broadcast news transcription , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Bing Xiang,et al.  Morphological Decomposition for Arabic Broadcast News Transcription , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[5]  Richard M. Schwartz,et al.  Efficient 2-pass n-best decoder , 1997, EUROSPEECH.

[6]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[7]  Richard M. Schwartz,et al.  Progress in transcription of Broadcast News using Byblos , 2002, Speech Commun..

[8]  Bing Xiang,et al.  Light supervision in acoustic model training , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.