Multi-Reference Evaluation for Dialectal Speech Recognition System: A Study for Egyptian ASR

Dialectal Arabic has no standard orthographic representation. This creates a challenge when evaluating an Automatic Speech Recognition (ASR) system for dialect. Since the reference transcription text can vary widely from one user to another, we propose an innovative approach for evaluating dialectal speech recognition using Multi-References. For each recognized speech segments, we ask five different users to transcribe the speech. We combine the alignment for the multiple references, and use the combined alignment to report a modified version of Word Error Rate (WER). This approach is in favor of accepting a recognized word if any of the references typed it in the same form. Our method proved to be more effective in capturing many correctly recognized words that have multiple acceptable spellings. The initial WER according to each of the five references individually ranged between 76.4% to 80.9%. When considering all references combined, the Multi-References MR-WER was found to be 53%.

[1]  Mark Hasegawa-Johnson,et al.  Development of a TV Broadcasts Speech Recognition System for Qatari Arabic , 2014, LREC.

[2]  Li Deng,et al.  Why word error rate is not a good metric for speech recognizer training for the speech translation task? , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Ibrahim Abu El-Khair,et al.  Arabic information retrieval , 2007, Annu. Rev. Inf. Sci. Technol..

[4]  Clive Holes,et al.  Modern Arabic: Structures, Functions, and Varieties , 1996 .

[5]  Ryan Cotterell,et al.  A Multi-Dialect, Multi-Genre Corpus of Informal Written Arabic , 2014, LREC.

[6]  Mark J. F. Gales,et al.  Morphological decomposition in Arabic ASR systems , 2012, Comput. Speech Lang..

[7]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[8]  James R. Glass,et al.  A complete KALDI recipe for building Arabic speech recognition systems , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[9]  Brian Kingsbury,et al.  The IBM 2008 GALE Arabic speech transcription system , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Stephan Vogel,et al.  Advances in dialectal Arabic speech recognition: a study using Twitter to improve Egyptian ASR , 2014, IWSLT.

[11]  Roxana Girju,et al.  YADAC: Yet another Dialectal Arabic Corpus , 2012, LREC.

[12]  Nizar Habash,et al.  Conventional Orthography for Dialectal Arabic , 2012, LREC.

[13]  Yu Zhang,et al.  Recent advances in ASR applied to an Arabic transcription system for Al-Jazeera , 2014, INTERSPEECH.