Generation of a phonetic transcription for modern standard Arabic: A knowledge-based model

Abstract This paper outlines a comprehensive system for automatically generating a phonetic transcription of a given Arabic text which closely matches the pronunciation of the speakers. The presented system is based on a set of (language-dependent) pronunciation rules that works on converting fully diacriticised Arabic text into the actual sounds, along with a lexicon for exceptional words. This is a two-phase process: one-to-one grapheme to phoneme conversion and then phoneme-to-allophone conversion using a set of “phonological rules”. Phonological rules operate on the phonemes and convert them to the actual sounds considering the neighbouring phones or the containing syllable or word. This system is developed for the purpose of delivering a robust Automatic Arabic Speech Recognition (AASR) system which is able to handle speech variation resulting from the mismatch between the text and the pronunciation. We anticipate that it could also be used for producing natural sounding speech from an Arabic text-to-speech (ATTS) system as well, but we have not extensively tested it in this application.

[1]  Wasfi G. Al-Khatib,et al.  Cross-word Arabic pronunciation variation modeling for speech recognition , 2011, Int. J. Speech Technol..

[2]  Don McAllaster,et al.  Fabricating conversational speech data with acoustic models: a program to examine model-data mismatch , 1998, ICSLP.

[3]  P. Lewis Ethnologue : languages of the world , 2009 .

[4]  Karin C. Ryding,et al.  A Reference Grammar of Modern Standard Arabic , 2005 .

[5]  Allan Ramsay,et al.  Towards including prosody in a text-to-speech system for modern standard Arabic , 2008, Comput. Speech Lang..

[6]  Eric Fosler-Lussier,et al.  A comparison of data-derived and knowledge-based modeling of pronunciation variation , 2000, INTERSPEECH.

[7]  Harriet J. Nock,et al.  Pronunciation modeling by sharing gaussian densities across phonetic models , 1999, EUROSPEECH.

[8]  Janet C. E. Watson,et al.  The Phonology and Morphology of Arabic , 2002 .

[9]  S. Davis,et al.  Emphasis spread in Arabic and grounded phonology , 1995 .

[10]  Ann Bies,et al.  Developing an Arabic Treebank: Methods, Guidelines, Procedures, and Tools , 2004 .

[11]  Janet C. E. Watson,et al.  The Directionality of Emphasis Spread in Arabic , 1999, Linguistic Inquiry.

[12]  Husni Al-Muhtaseb,et al.  Arabic broadcast news transcription system , 2007, Int. J. Speech Technol..

[13]  M. Ali,et al.  Generation of arabic phonetic dictionaries for speech recognition , 2008, 2008 International Conference on Innovations in Information Technology.

[14]  Helmer Strik,et al.  Modeling pronunciation variation for ASR: overview and comparison of methods , 1998 .

[15]  Wasfi G. Al-Khatib,et al.  Within-word pronunciation variation modeling for Arabic ASRs: a direct data-driven approach , 2011, International Journal of Speech Technology.

[16]  Alexander H. Waibel,et al.  Dictionary learning for spontaneous speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[17]  J. McCarthy On stress and syllabification , 1979 .

[18]  Raed Abu Zitar,et al.  Arabic speech recognition using SPHINX engine , 2006, Int. J. Speech Technol..

[19]  David Crystal,et al.  A dictionary of linguistics and phonetics , 1997 .

[20]  K. D. Jong,et al.  Stress, duration, and intonation in Arabic word-level prosody , 1999 .

[21]  Nizar Habash,et al.  Improving the Arabic Pronunciation Dictionary for Phone and Word Recognition with Linguistically-Based Pronunciation Rules , 2009, HLT-NAACL.

[22]  Yousif A. El-Imam Phonetization of Arabic: rules and algorithms , 2004, Comput. Speech Lang..