UTD-CRSS submission for MGB-3 Arabic dialect identification: Front-end and back-end advancements on broadcast speech

This study presents systems submitted by the University of Texas at Dallas, Center for Robust Speech Systems (UTD-CRSS) to the MGB-3 Arabic Dialect Identification (ADI) subtask. This task is defined to discriminate between five dialects of Arabic, including Egyptian, Gulf, Levantine, North African, and Modern Standard Arabic. We develop multiple single systems with different front-end representations and back-end classifiers. At the front-end level, feature extraction methods such as Mel-frequency cepstral coefficients (MFCCs) and two types of bottleneck features (BNF) are studied for an i-Vector framework. As for the back-end level, Gaussian back-end (GB), and Generative Adversarial Networks (GANs) classifiers are applied alternately. The best submission (contrastive) is achieved for the ADI subtask with an accuracy of 76.94% by augmenting the randomly chosen part of the development dataset. Further, with a post evaluation correction in the submitted system, final accuracy is increased to 79.76%, which represents the best performance achieved so far for the challenge on the test dataset.

[1]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Bo Xu,et al.  Mandarin accent adaptation based on context-independent/context-dependent pronunciation modeling , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3]  J. Hansen,et al.  Dialect Classification via Text-Independent Training and Testing for Arabic, Spanish, and Chinese , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Tao Chen,et al.  Analysis of Speaker Variability , 2022 .

[5]  Chinnappa Guggilla,et al.  Discrimination between Similar Languages, Varieties and Dialects using CNN- and LSTM-based Deep Neural Networks , 2016, VarDial@COLING.

[6]  John H. L. Hansen,et al.  Arabic Dialect Identification - 'Is the Secret in the Silence?' and Other Observations , 2012, INTERSPEECH.

[7]  James R. Glass,et al.  Automatic Dialect Detection in Arabic Broadcast Speech , 2015, INTERSPEECH.

[8]  Steven Greenberg,et al.  ON THE ORIGINS OF SPEECH INTELLIGIBILITY IN THE REAL WORLD , 1997 .

[9]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[10]  Hagen Soltau,et al.  From Modern Standard Arabic to Levantine ASR: Leveraging GALE for dialects , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[11]  John H. L. Hansen,et al.  Semi-supervised Learning with Generative Adversarial Networks for Arabic Dialect Identification , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Philip C. Woodland,et al.  The use of accent-specific pronunciation dictionaries in acoustic model training , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[13]  Vassilios Diakoloukas,et al.  Development of dialect-specific speech recognizers using adaptation methods , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Nizar Habash,et al.  Parsing Arabic Dialects , 2006, EACL.

[15]  Stephan Vogel,et al.  Speech recognition challenge in the wild: Arabic MGB-3 , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[16]  John H. L. Hansen,et al.  Unsupervised accent classification for deep data fusion of accent and language information , 2016, Speech Commun..

[17]  John H. L. Hansen,et al.  An investigation on back-end for speaker recognition in multi-session enrollment , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  P. Mermelstein,et al.  Effects of speaker accent on the performance of a speaker-independent, isolated-word recognizer , 1982 .

[19]  Nizar Habash,et al.  Spoken Arabic Dialect Identification Using Phonotactic Modeling , 2009, SEMITIC@EACL.

[20]  Chris Callison-Burch,et al.  Arabic Dialect Identification , 2014, CL.

[21]  John H. L. Hansen,et al.  Dialect Recognition Based on Unsupervised Bottleneck Features , 2017, INTERSPEECH.

[22]  Preslav Nakov,et al.  Discriminating between Similar Languages and Arabic Dialect Identification: A Report on the Third DSL Shared Task , 2016, VarDial@COLING.

[23]  Preslav Nakov,et al.  Findings of the VarDial Evaluation Campaign 2017 , 2017, VarDial.

[24]  Fei Huang Improved Arabic Dialect Classification with Social Media Data , 2015, EMNLP.

[25]  James R. Glass,et al.  The MGB-2 challenge: Arabic multi-dialect broadcast media recognition , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[26]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[27]  Yonatan Belinkov,et al.  A Character-level Convolutional Neural Network for Distinguishing Similar Languages and Dialects , 2016, VarDial@COLING.

[28]  Wayne H. Ward,et al.  Lexicon adaptation for LVCSR: speaker idiosyncracies, non-native speakers, and pronunciation choice , 2002 .