Supervector pre-processing for PRSVM-based Chinese and Arabic dialect identification

Phonotactic modeling has become a widely used means for speaker, language, and dialect recognition. This paper explores variations to supervector pre-processing for phone recognition-support vector machines (PRSVM) based dialect identification. The aspects studied are: (i) normalization of supervector dimensions in the pre-squashing stage, (ii) impact of alternative squashing functions, and (iii) N-gram selection for supervector dimensionality reduction. In (i) and (ii), we find that several alternatives to commonly used approaches can provide moderate, yet consistent performance improvements. In (iii), a newly proposed dialect salience measure is applied in supervector dimension selection and compared to a common N-gram frequency based selection. The results show a strong correlation between dialect-salience and frequency of occurrence in N-grams. The evaluations in this study are conducted on a corpus of Chinese dialects, a Pan-Arabic corpus, and a set of Arabic CTS corpora.

[1]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[2]  William M. Campbell,et al.  Phonetic Speaker Recognition with Support Vector Machines , 2003, NIPS.

[3]  William M. Campbell,et al.  Language Recognition with Word Lattices and Support Vector Machines , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[4]  Andreas Stolcke,et al.  Effective Arabic Dialect Classification Using Diverse Phonotactic Models , 2011, INTERSPEECH.

[5]  Douglas E. Sturim,et al.  The MITLL NIST LRE 2015 Language Recognition System , 2016, Odyssey.

[6]  Marc A. Zissman,et al.  Automatic dialect identification of extemporaneous conversational, Latin American Spanish speech , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[7]  J. Hansen,et al.  Dialect Classification via Text-Independent Training and Testing for Arabic, Spanish, and Chinese , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Douglas E. Sturim,et al.  The MITLL NIST LRE 2009 language recognition system , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Nizar Habash,et al.  Spoken Arabic Dialect Identification Using Phonotactic Modeling , 2009, SEMITIC@EACL.

[10]  Yonghong Yan,et al.  The Design of Backend Classifiers in PPRLM System for Language Identification , 2007, Third International Conference on Natural Computation (ICNC 2007).

[11]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[12]  Daniel P. W. Ellis,et al.  Dialect and Accent Recognition Using Phonetic-Segmentation Supervectors , 2011, INTERSPEECH.

[13]  Bin Ma,et al.  A Vector Space Modeling Approach to Spoken Language Identification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  William M. Campbell,et al.  Experiments with Lattice-based PPRLM Language Identification , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[15]  Yan Shi,et al.  Using an RBF Neural Network to Locate Program Bugs , 2008, 2008 19th International Symposium on Software Reliability Engineering (ISSRE).

[16]  John H. L. Hansen,et al.  Arabic Dialect Identification - 'Is the Secret in the Silence?' and Other Observations , 2012, INTERSPEECH.

[17]  Li-Rong Dai,et al.  The Adaptation Schemes In PR-SVM Based Language Recognition , 2008, 2008 6th International Symposium on Chinese Spoken Language Processing.