Automatic Speech Recognition for Tunisian Dialect

Speech recognition for under-resourced languages represents an active field of research during the past decade. The tunisian arabic dialect has been chosen as a typical example for an under-resourced Arabic dialect. We propose, in this paper, our first steps to build an automatic speech recognition system for Tunisian dialect. Several Acoustic Models have been trained using HMM-GMM and HMM-DNN system. The speech corpus has been collected and transcribed from dialogues in the Tunisian Railway Transport Network. The HMM-DNN system can give an impressive relative reduction in WER.

[1]  Hassan Satori,et al.  Introduction to Arabic Speech Recognition Using CMUSphinx System , 2007, ArXiv.

[2]  Oliver Watts,et al.  Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14) , 2014 .

[3]  Lamia Hadrich Belguith,et al.  Morphological Analysis of Tunisian Dialect , 2013, IJCNLP.

[4]  Simon King,et al.  IEEE Workshop on automatic speech recognition and understanding , 2009 .

[5]  Hsiao-Wuen Hon,et al.  An overview of the SPHINX speech recognition system , 1990, IEEE Trans. Acoust. Speech Signal Process..

[6]  Nizar Habash,et al.  A Conventional Orthography for Tunisian Arabic , 2014, LREC.

[7]  James R. Glass,et al.  A complete KALDI recipe for building Arabic speech recognition systems , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[8]  Nizar Habash,et al.  MADA + TOKAN : A Toolkit for Arabic Tokenization , Diacritization , Morphological Disambiguation , POS Tagging , Stemming and Lemmatization , 2009 .

[9]  Kiyohiro Shikano,et al.  Julius - an open source real-time large vocabulary recognition engine , 2001, INTERSPEECH.

[10]  Kamel Smaïli,et al.  An enhanced automatic speech recognition system for Arabic , 2017, WANLP@EACL.

[11]  Mark Hasegawa-Johnson,et al.  Development of a TV Broadcasts Speech Recognition System for Qatari Arabic , 2014, LREC.

[12]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[13]  Nizar Habash,et al.  A Corpus and Phonetic Dictionary for Tunisian Arabic Speech Recognition , 2014, LREC.

[14]  Steve Young,et al.  The HTK hidden Markov model toolkit: design and philosophy , 1993 .