Investigating Data Sharing in Speech Recognition for an Under-Resourced Language: The Case of Algerian Dialect

The Arabic language has many varieties, including its standard form, Modern Standard Arabic (MSA), and its spoken forms, namely the dialects. Those dialects are representative examples of under-resourced languages for which automatic speech recognition is considered as an unresolved issue. To address this issue, we recorded several hours of spoken Algerian dialect and used them to train a baseline model. This model was boosted afterwards by taking advantage of other languages that impact this dialect by integrating their data in one large corpus and by investigating three approaches: multilingual training, multitask learning and transfer learning. The best performance was achieved using a limited and balanced amount of acoustic data from each additional language, as compared to the data size of the studied dialect. This approach led to an improvement of 3.8% in terms of word error rate in comparison to the baseline system trained only on the dialect data.

[1]  Lukás Burget,et al.  Sequence-discriminative training of deep neural networks , 2013, INTERSPEECH.

[2]  Lori Lamel,et al.  Addressing Code-Switching in French/Algerian Arabic Speech , 2017, INTERSPEECH.

[3]  Laurent Besacier,et al.  Automatic Speech Recognition for Under-Resourced Languages: Application to Vietnamese Language , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Emmanuel Jeannot,et al.  Adding Virtualization Capabilities to the Grid'5000 Testbed , 2012, CLOSER.

[5]  Mehryar Mohri,et al.  Speech Recognition with Weighted Finite-State Transducers , 2008 .

[6]  Dirk Van Compernolle,et al.  Speech recognition for under-resourced languages: Data sharing in hidden Markov model systems , 2017 .

[7]  Alta de Waal,et al.  A smartphone-based ASR data collection tool for under-resourced languages , 2014, Speech Commun..

[8]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[9]  Kamel Smaïli,et al.  CALYOU: A Comparable Spoken Algerian Corpus Harvested from YouTube , 2017, INTERSPEECH.

[10]  James Glass,et al.  The MGB-5 Challenge: Recognition and Dialect Identification of Dialectal Arabic Speech , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[11]  Guillaume Gravier,et al.  The ester 2 evaluation campaign for the rich transcription of French radio broadcasts , 2009, INTERSPEECH.

[12]  Sebastian Stüker,et al.  A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments , 2017, LREC.

[13]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Dirk Van Compernolle,et al.  Using Weighted Model Averaging in Distributed Multilingual DNNs to Improve Low Resource ASR , 2016, SLTU.

[15]  Karima Meftouh,et al.  PADIC: extension and new experiments , 2018 .

[16]  Horia Cucu,et al.  Investigating the role of machine translated text in ASR domain adaptation: Unsupervised and semi-supervised methods , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[17]  Sanjeev Khudanpur,et al.  A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.

[18]  Fethi Bougares,et al.  Automatic speech recognition system for Tunisian dialect , 2017, Lang. Resour. Evaluation.

[19]  Karima Meftouh,et al.  Machine Translation Experiments on PADIC: A Parallel Arabic DIalect Corpus , 2015, PACLIC.

[20]  Kamel Smaïli,et al.  An Automatic Learning of an Algerian Dialect Lexicon by using Multilingual Word Embeddings , 2018, LREC.

[21]  Dong Yu,et al.  Active Learning and Semi-supervised Learning for Speech Recognition: a Unified Framework Using the Global Entropy Reduction Maximization Criterion Computer Speech and Language Article in Press Active Learning and Semi-supervised Learning for Speech Recognition: a Unified Framework Using the Global E , 2022 .

[22]  Karima Meftouh,et al.  Grapheme to phoneme conversion: an Arabic dialect case , 2014, SLTU.

[23]  Tanja Schultz,et al.  Grapheme based speech recognition , 2003, INTERSPEECH.

[24]  James R. Glass,et al.  A complete KALDI recipe for building Arabic speech recognition systems , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[25]  Lori Lamel,et al.  Comparing SMT Methods for Automatic Generation of Pronunciation Variants , 2010, IceTAL.

[26]  Dimitra Vergyri,et al.  Cross-dialectal acoustic data sharing for Arabic speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[27]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .