论文信息 - QMDIS: QCRI-MIT Advanced Dialect Identification System

QMDIS: QCRI-MIT Advanced Dialect Identification System

As a continuation of our efforts towards tackling the problem of spoken Dialect Identification (DID) for Arabic languages, we present the QCRI-MIT Advanced Dialect Identification System (QMDIS). QMDIS is an automatic spoken DID system for Dialectal Arabic (DA). In this paper, we report a comprehensive study of the three main components used in the spoken DID task: phonotactic, lexical and acoustic. We use Support Vector Machines (SVMs), Logistic Regression (LR) and Convolutional Neural Networks (CNNs) as backend classifiers throughout the study. We perform all our experiments on a publicly available dataset and present new state-of-the-art results. QMDIS discriminates between the five most widely used dialects of Arabic: namely Egyptian, Gulf, Levantine, North African, and Modern Standard Arabic (MSA). We report≈ 73% accuracy for system combination. All the data and the code used in our experiments are publicly available for research.

[1] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Lukás Burget,et al. iVector-based prosodic system for language identification , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3] Nizar Habash,et al. MADA + TOKAN : A Toolkit for Arabic Tokenization , Diacritization , Morphological Disambiguation , POS Tagging , Stemming and Lemmatization , 2009 .

[4] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .

[5] James R. Glass,et al. Unsupervised Learning of Spoken Language with Visual Context , 2016, NIPS.

[6] James R. Glass,et al. Lexical modeling for Arabic ASR: a systematic approach , 2014, INTERSPEECH.

[7] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[8] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[9] Sameer Khurana,et al. QCRI advanced transcription system (QATS) for the Arabic Multi-Dialect Broadcast media recognition: MGB-2 challenge , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[10] Florian Metze,et al. Speaker Adaptive Training of Deep Neural Network Acoustic Models Using I-Vectors , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11] J. Gonzalez-Dominguez,et al. Language Identification in Short Utterances Using Long Short-Term Memory (LSTM) Recurrent Neural Networks , 2016, PloS one.

[12] James R. Glass,et al. Automatic Dialect Detection in Arabic Broadcast Speech , 2015, INTERSPEECH.

[13] Mireia Díez,et al. PLLR features in language recognition system for RATS , 2014, INTERSPEECH.

[14] Yonatan Belinkov,et al. A Character-level Convolutional Neural Network for Distinguishing Similar Languages and Dialects , 2016, VarDial@COLING.