Development of the Arabic Loria Automatic Speech Recognition system (ALASR) and its evaluation for Algerian dialect

Abstract This paper addresses the development of an Automatic Speech Recognition system for Modern Standard Arabic (MSA) and its extension to Algerian dialect. Algerian dialect is very different from Arabic dialects of the Middle-East, since it is highly influenced by the French language. In this article, we start by presenting the new automatic speech recognition named ALASR (Arabic Loria Automatic Speech Recognition) system. The acoustic model of ALASR is based on a DNN approach and the language model is a classical n-gram. Several options are investigated in this paper to find the best combination of models and parameters. ALASR achieves good results for MSA in terms of WER (14.02%), but it completely collapses on an Algerian dialect data set of 70 minutes (a WER of 89%). In order to take into account the impact of the French language, on the Algerian dialect, we combine in ALASR two acoustic models, the original one (MSA) and a French one trained on ESTER corpus. This solution has been adopted because no transcribed speech data for Algerian dialect are available. This combination leads to a substantial absolute reduction of the word error of 24%.

[1]  Karima Meftouh,et al.  Grapheme to phoneme conversion: an Arabic dialect case , 2014, SLTU.

[2]  Stephan Vogel,et al.  Advances in dialectal Arabic speech recognition: a study using Twitter to improve Egyptian ASR , 2014, IWSLT.

[3]  Hermann Ney,et al.  Investigating the use of morphological decomposition and diacritization for improving Arabic LVCSR , 2009, INTERSPEECH.

[4]  Amna A. Al Kaabi,et al.  Arabic Light Stemmer : Anew Enhanced Approach , 2005 .

[5]  James R. Glass,et al.  A complete KALDI recipe for building Arabic speech recognition systems , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[6]  Rabih Zbib,et al.  Improved morphological decomposition for Arabic broadcast news transcription , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Mark Hasegawa-Johnson,et al.  Development of a TV Broadcasts Speech Recognition System for Qatari Arabic , 2014, LREC.

[8]  Andreas Stolcke,et al.  Entropy-based Pruning of Backoff Language Models , 2000, ArXiv.

[9]  Hermann Ney,et al.  A Hybrid Morphologically Decomposed Factored Language Models for Arabic LVCSR , 2010, HLT-NAACL.

[10]  Hagen Soltau,et al.  From Modern Standard Arabic to Levantine ASR: Leveraging GALE for dialects , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[11]  Guillaume Gravier,et al.  The ester 2 evaluation campaign for the rich transcription of French radio broadcasts , 2009, INTERSPEECH.

[12]  Andreas Stolcke,et al.  Development of a conversational telephone speech recognizer for Levantine Arabic , 2005, INTERSPEECH.

[13]  Mark Hasegawa-Johnson,et al.  A Baseline Speech Recognition System for Levantine Colloquial Arabic , 2012 .

[14]  Ruhi Sarikaya,et al.  On the use of morphological analysis for dialectal Arabic speech recognition , 2006, INTERSPEECH.

[15]  Mehryar Mohri,et al.  Speech Recognition with Weighted Finite-State Transducers , 2008 .

[16]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[17]  Stephan Vogel,et al.  Speech recognition challenge in the wild: Arabic MGB-3 , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[18]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[19]  Karima Meftouh,et al.  Machine Translation Experiments on PADIC: A Parallel Arabic DIalect Corpus , 2015, PACLIC.

[20]  Nadir Durrani,et al.  Farasa: A Fast and Furious Segmenter for Arabic , 2016, NAACL.

[21]  James R. Glass,et al.  Automatic Dialect Detection in Arabic Broadcast Speech , 2015, INTERSPEECH.

[22]  Mark J. F. Gales,et al.  Morphological analysis and decomposition for Arabic speech-to-text systems , 2009, INTERSPEECH.

[23]  Lukás Burget,et al.  Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.

[24]  Samantha Wray,et al.  Crowdsource a little to label a lot: labeling a speech corpus of dialectal Arabic , 2015, INTERSPEECH.

[25]  Kamel Smaïli,et al.  CALYOU: A Comparable Spoken Algerian Corpus Harvested from YouTube , 2017, INTERSPEECH.