The impact of phonological rules on Arabic speech recognition

The pronunciation variation is a well-known phenomenon that has been widely investigated for automatic speech recognition (ASR). The knowledge-based phonological rules are generally used to capture the accurate phonetic realization in order to minimize the mismatch between the ASR dictionary and the actual phonetic representation of the speech signal. For the Arabic ASR, there are a number of studies that employ these rules on Arabic ASR systems; however, little research has been devoted to measure the precise performance of each rule. In this paper, we aim at finding the exact effect of each rule as well as the rules that have no influence. We used the Carnegie Mellon University PocketSphinx speech recognizer with a new “in-house” modern standard Arabic speech corpus that contains 19 h for training and 3.7 h for testing. We evaluated the effect of three famous rules (Shadda, Tanween, and the solar letters). The experimental results do not show clear evidence that using phonological rules for ASR dictionary adaptation can enhance the performance for within-word pronunciation variation. The obtained results might be an indication to rethink or use other ASR performance aspects, such as cross-word pronunciation variation and the optimal phonemes set of the Arabic language.

[1]  Daniel Jurafsky,et al.  Building multiple pronunciation models for novel words using exploratory computational phonology , 1995, EUROSPEECH.

[2]  Kamaruzaman Jusoff,et al.  Acoustic Pronunciation Variations Modeling for Standard Malay Speech Recognition , 2008, Comput. Inf. Sci..

[3]  Minhwa Chung,et al.  Morpheme-Based Modeling of Pronunciation Variation for Large Vocabulary Continuous Speech Recognition in Korean , 2007, IEICE Trans. Inf. Syst..

[4]  Husni Al-Muhtaseb,et al.  Arabic Phonetic Dictionaries for Speech Recognition , 2009, J. Inf. Technol. Res..

[5]  Helmer Strik,et al.  Improving the performance of a Dutch CSR by modeling within-word and cross-word pronunciation variation , 1999, Speech Commun..

[6]  Fawaz S. Al-Anzi,et al.  Stemming impact on Arabic text categorization performance: A survey , 2015, 2015 5th International Conference on Information & Communication Technology and Accessibility (ICTA).

[7]  Kyuwoong Hwang,et al.  Automatic generation of Korean pronunciation variants by multistage applications of phonological rules , 1998, ICSLP.

[8]  Alfred Mertins,et al.  Automatic speech recognition and speech variability: A review , 2007, Speech Commun..

[9]  Dimitra Vergyri,et al.  Automatic Diacritization of Arabic for Acoustic Modeling in Speech Recognition , 2004 .

[10]  Wasfi G. Al-Khatib,et al.  Within-word pronunciation variation modeling for Arabic ASRs: a direct data-driven approach , 2011, International Journal of Speech Technology.

[11]  Nizar Habash,et al.  A Corpus and Phonetic Dictionary for Tunisian Arabic Speech Recognition , 2014, LREC.

[12]  Ian R. Lane,et al.  Pronunciation modeling for dialectal arabic speech recognition , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[13]  Mirjam Wester,et al.  Pronunciation modeling for ASR - knowledge-based and data-derived methods , 2003, Comput. Speech Lang..

[14]  Raja Noor Ainon,et al.  Arabic speaker-independent continuous automatic speech recognition based on a phonetically rich and balanced speech corpus , 2012, Int. Arab J. Inf. Technol..

[15]  Andreas Stolcke,et al.  Development of the SRI/nightingale Arabic ASR system , 2008, INTERSPEECH.

[16]  Nizar Habash,et al.  Improving the Arabic Pronunciation Dictionary for Phone and Word Recognition with Linguistically-Based Pronunciation Rules , 2009, HLT-NAACL.

[17]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[18]  Husni Al-Muhtaseb,et al.  Arabic broadcast news transcription system , 2007, Int. J. Speech Technol..

[19]  Allan Ramsay,et al.  Generation of a phonetic transcription for modern standard Arabic: A knowledge-based model , 2014, Comput. Speech Lang..

[20]  James H. Martin,et al.  Speech and Language Processing, 2nd Edition , 2008 .

[21]  Pascale Fung,et al.  Modeling partial pronunciation variations for spontaneous Mandarin speech recognition , 2002, Comput. Speech Lang..

[22]  Alexander H. Waibel,et al.  Speaking mode dependent pronunciation modeling in large vocabulary conversational speech recognition , 1997, EUROSPEECH.

[23]  Wasfi G. Al-Khatib,et al.  Cross-word Arabic pronunciation variation modeling for speech recognition , 2011, Int. J. Speech Technol..

[24]  Wasfi G. Al-Khatib,et al.  Toward enhanced Arabic speech recognition using part of speech tagging , 2011, Int. J. Speech Technol..