QMDIS: QCRI-MIT Advanced Dialect Identification System

As a continuation of our efforts towards tackling the problem of spoken Dialect Identification (DID) for Arabic languages, we present the QCRI-MIT Advanced Dialect Identification System (QMDIS). QMDIS is an automatic spoken DID system for Dialectal Arabic (DA). In this paper, we report a comprehensive study of the three main components used in the spoken DID task: phonotactic, lexical and acoustic. We use Support Vector Machines (SVMs), Logistic Regression (LR) and Convolutional Neural Networks (CNNs) as backend classifiers throughout the study. We perform all our experiments on a publicly available dataset and present new state-of-the-art results. QMDIS discriminates between the five most widely used dialects of Arabic: namely Egyptian, Gulf, Levantine, North African, and Modern Standard Arabic (MSA). We report≈ 73% accuracy for system combination. All the data and the code used in our experiments are publicly available for research.

[1]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Lukás Burget,et al.  iVector-based prosodic system for language identification , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Nizar Habash,et al.  MADA + TOKAN : A Toolkit for Arabic Tokenization , Diacritization , Morphological Disambiguation , POS Tagging , Stemming and Lemmatization , 2009 .

[4]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[5]  James R. Glass,et al.  Unsupervised Learning of Spoken Language with Visual Context , 2016, NIPS.

[6]  James R. Glass,et al.  Lexical modeling for Arabic ASR: a systematic approach , 2014, INTERSPEECH.

[7]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[8]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[9]  Sameer Khurana,et al.  QCRI advanced transcription system (QATS) for the Arabic Multi-Dialect Broadcast media recognition: MGB-2 challenge , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[10]  Florian Metze,et al.  Speaker Adaptive Training of Deep Neural Network Acoustic Models Using I-Vectors , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  J. Gonzalez-Dominguez,et al.  Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks , 2016, PloS one.

[12]  James R. Glass,et al.  Automatic Dialect Detection in Arabic Broadcast Speech , 2015, INTERSPEECH.

[13]  Mireia Díez,et al.  PLLR features in language recognition system for RATS , 2014, INTERSPEECH.

[14]  Yonatan Belinkov,et al.  A Character-level Convolutional Neural Network for Distinguishing Similar Languages and Dialects , 2016, VarDial@COLING.

[15]  Samantha Wray,et al.  Crowdsource a little to label a lot: labeling a speech corpus of dialectal Arabic , 2015, INTERSPEECH.

[16]  A. Backus Code-switching in conversation: Language, interaction and identity , 2000 .

[17]  Douglas A. Reynolds,et al.  Language Recognition via i-vectors and Dimensionality Reduction , 2011, INTERSPEECH.

[18]  Preslav Nakov,et al.  Findings of the VarDial Evaluation Campaign 2017 , 2017, VarDial.

[19]  Nizar Habash,et al.  Improving the Arabic Pronunciation Dictionary for Phone and Word Recognition with Linguistically-Based Pronunciation Rules , 2009, HLT-NAACL.

[20]  Yousif A. El-Imam Phonetization of Arabic: rules and algorithms , 2004, Comput. Speech Lang..

[21]  Fahim Dalvi,et al.  QCRI @ DSL 2016: Spoken Arabic Dialect Identification Using Textual Features , 2016, VarDial@COLING.

[22]  James R. Glass,et al.  The MGB-2 challenge: Arabic multi-dialect broadcast media recognition , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[23]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[24]  James R. Glass,et al.  Non-Negative Factor Analysis of Gaussian Mixture Model Weight Adaptation for Language and Dialect Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[25]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.