Development of a TV Broadcasts Speech Recognition System for Qatari Arabic

A major problem with dialectal Arabic speech recognition is due to the sparsity of speech resources. In this paper, a transfer learning framework is proposed to jointly use a large amount of Modern Standard Arabic (MSA) data and little amount of dialectal Arabic data to improve acoustic and language modeling. The Qatari Arabic (QA) dialect has been chosen as a typical example for an under-resourced Arabic dialect. A wide-band speech corpus has been collected and transcribed from several Qatari TV series and talk-show programs. A large vocabulary speech recognition baseline system was built using the QA corpus. The proposed MSA-based transfer learning technique was performed by applying orthographic normalization, phone mapping, data pooling, acoustic model adaptation, and system combination. The proposed approach can achieve more than 28% relative reduction in WER.

[1]  Slim Abdennadher,et al.  Rapid phonetic transcription using everyday life natural Chat Alphabet orthography for dialectal Arabic speech recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Slim Abdennadher,et al.  Cross-lingual acoustic modeling for dialectal Arabic speech recognition , 2010, INTERSPEECH.

[3]  J. Xu,et al.  Audio Indexing of Arabic broadcast news , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[5]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[6]  Andreas Stolcke,et al.  Development of a conversational telephone speech recognizer for Levantine Arabic , 2005, INTERSPEECH.

[7]  Vaibhava Goel,et al.  Minimum Bayes-risk automatic speech recognition , 2000, Comput. Speech Lang..

[8]  Brian Kingsbury,et al.  The IBM 2008 GALE Arabic speech transcription system , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Mark Hasegawa-Johnson,et al.  Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition , 2012 .

[10]  Dimitra Vergyri,et al.  Cross-dialectal data sharing for acoustic modeling in Arabic speech recognition , 2005, Speech Commun..

[11]  References , 1971 .

[12]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[13]  Jean-Luc Gauvain,et al.  Speaker adaptation based on MAP estimation of HMM parameters , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .