Arabic Dialect Identification - 'Is the Secret in the Silence?' and Other Observations

Conversational telephone speech (CTS) collections of Arabic dialects distributed trough the Linguistic Data Consortium (LDC) provide an invaluable resource for the development of robust speech systems including speaker and speech recognition, translation, spoken dialogue modeling, and information summarization. They are frequently relied on also in language (LID) and dialect identification (DID) evaluations. The first part of this study attempts to identify the source of the relatively high DID performance on LDC’s Arabic CTS corpora seen in recent literature. It is found that recordings of each dialect exhibit unique channel and noise characteristics and that silence regions are sufficient for performing reasonably accurate DID. The second part focuses on phonotactic dialect modeling that utilizes phone recognizers and support vector machines (PRSVM). A simple N-gram normalization of PRSVM input supervectors utilizing hard limiting is introduced and shown to outperform the standard approach used in current LID and DID systems.

[1]  Andreas Stolcke,et al.  Improving Language Recognition with Multilingual Phone Recognition and Speaker Adaptation Transforms , 2010, Odyssey.

[2]  Daniel P. W. Ellis,et al.  Dialect and Accent Recognition Using Phonetic-Segmentation Supervectors , 2011, INTERSPEECH.

[3]  Yonghong Yan,et al.  The Design of Backend Classifiers in PPRLM System for Language Identification , 2007, Third International Conference on Natural Computation (ICNC 2007).

[4]  Douglas E. Sturim,et al.  The MITLL NIST LRE 2009 language recognition system , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[6]  William M. Campbell,et al.  Experiments with Lattice-based PPRLM Language Identification , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[7]  Nizar Habash,et al.  Spoken Arabic Dialect Identification Using Phonotactic Modeling , 2009, SEMITIC@EACL.

[8]  William M. Campbell,et al.  Phonetic Speaker Recognition with Support Vector Machines , 2003, NIPS.

[9]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[10]  J. Hansen,et al.  Dialect Classification via Text-Independent Training and Testing for Arabic, Spanish, and Chinese , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Andreas Stolcke,et al.  Effective Arabic Dialect Classification Using Diverse Phonotactic Models , 2011, INTERSPEECH.

[12]  William M. Campbell,et al.  Language Recognition with Word Lattices and Support Vector Machines , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[13]  Marc A. Zissman,et al.  Automatic dialect identification of extemporaneous conversational, Latin American Spanish speech , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.