Automatic Diacritization of Arabic for Acoustic Modeling in Speech Recognition

Automatic recognition of Arabic dialectal speech is a challenging task because Arabic dialects are essentially spoken varieties. Only few dialectal resources are available to date; moreover, most available acoustic data collections are transcribed without diacritics. Such a transcription omits essential pronunciation information about a word, such as short vowels. In this paper we investigate various procedures that enable us to use such training data by automatically inserting the missing diacritics into the transcription. These procedures use acoustic information in combination with different levels of morphological and contextual constraints. We evaluate their performance against manually diacritized transcriptions. In addition, we demonstrate the effect of their accuracy on the recognition performance of acoustic models trained on automatically diacritized training data.

[1]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[2]  Ya'akov Gal An HMM Approach to Vowel Restoration in Arabic and Hebrew , 2002, SEMITIC@ACL.

[3]  Mitch Weintraub,et al.  Explicit word error minimization in n-best list rescoring , 1997, EUROSPEECH.

[4]  Vassilios Digalakis,et al.  Speaker adaptation using constrained estimation of Gaussian mixtures , 1995, IEEE Trans. Speech Audio Process..

[5]  Salim Roukos,et al.  Audio-Indexing For Broadcast News , 1998, TREC.

[6]  Heinrich Niemann,et al.  Speaker Adaptation Using , 1992 .

[7]  Dimitra Vergyri,et al.  Cross-dialectal acoustic data sharing for Arabic speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Vassilios Digalakis,et al.  Genones: optimizing the degree of mixture tying in a large vocabulary hidden Markov model based speech recognizer , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Geoffrey Zweig,et al.  The graphical models toolkit: An open source software system for speech and time-series processing , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Yousif A. El-Imam Phonetization of Arabic: rules and algorithms , 2004, Comput. Speech Lang..