论文信息 - Foreign accent detection from spoken Finnish using i-vectors

Foreign accent detection from spoken Finnish using i-vectors

I-vector based recognition is a well-established technique in state-of-the-art speaker and language recognition but its use in dialect and accent classification has received less attention. We represent an experimental study of i-vector based dialect classification, with a special focus on foreign accent detection from spoken Finnish. Using the CallFriend corpus, we first study how recognition accuracy is affected by the choices of various i-vector system parameters, such as the number of Gaussians, i-vector dimensionality and reduction method. We then apply the same methods on the Finnish national foreign language certificate (FSD) corpus and compare the results to traditional Gaussian mixture model - universal background model (GMM-UBM) recognizer. The results, in terms of equal error rate, indicate that i-vectors outperform GMM-UBM as one expects. We also notice that in foreign accent detection, 7 out of 9 accents were more accurately detected by Gaussian scoring than by cosine scoring. Index Terms: Dialect recognition, foreign accent recognition, i-vector, GMM-UBM, Finnish language

[1] Lukás Burget,et al. BUT language recognition system for NIST 2007 evaluations , 2008, INTERSPEECH.

[2] Douglas A. Reynolds,et al. Dialect identification using Gaussian mixture models , 2004, Odyssey.

[3] John H. L. Hansen,et al. A systematic strategy for robust automatic dialect identification , 2011, 2011 19th European Signal Processing Conference.

[4] Douglas A. Reynolds,et al. Language Recognition via i-vectors and Dimensionality Reduction , 2011, INTERSPEECH.

[5] Douglas E. Sturim,et al. The MITLL NIST LRE 2009 language recognition system , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6] M. A. Kohler,et al. Language identification using shifted delta cepstra , 2002, The 2002 45th Midwest Symposium on Circuits and Systems, 2002. MWSCAS-2002..

[7] John Nerbonne,et al. Linguistic Variation and Computation (Invited talk) , 2003, EACL.

[8] Douglas E. Sturim,et al. The MITLL NIST LRE 2015 Language Recognition System , 2016, Odyssey.

[9] Sridha Sridharan,et al. i-vector Based Speaker Recognition on Short Utterances , 2011, INTERSPEECH.

[10] Marc A. Zissman,et al. Automatic dialect identification of extemporaneous conversational, Latin American Spanish speech , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[11] Douglas A. Reynolds,et al. Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[12] Robert P. W. Duin,et al. Linear dimensionality reduction via a heteroscedastic extension of LDA: the Chernoff criterion , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13] Douglas A. Reynolds,et al. A study of computation speed-UPS of the GMM-UBM speaker recognition system , 1999, EUROSPEECH.

[14] Joseph P. Campbell,et al. A linguistically-informative approach to dialect recognition using dialect-discriminating context-dependent phonetic models , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15] Patrick Kenny,et al. Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification , 2009, INTERSPEECH.

[16] Stephen J. Cox,et al. Iterative classification of regional British accents in i-vector space , 2012, MLSLP.

[17] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[18] Dirk Van Compernolle,et al. Feature subset selection for improved native accent identification , 2010, Speech Commun..

[19] Bin Ma,et al. Spoken Language Recognition: From Fundamentals to Practice , 2013, Proceedings of the IEEE.

[20] Man-Wai Mak,et al. Alleviating the small sample-size problem in i-vector based speaker verification , 2012, 2012 8th International Symposium on Chinese Spoken Language Processing.

[21] Li Lee,et al. Speaker normalization using efficient frequency warping procedures , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[22] Mark J. F. Gales,et al. Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[23] Lukás Burget,et al. Language Recognition in iVectors Space , 2011, INTERSPEECH.

[24] Julia Hirschberg,et al. Automatic Dialect and Accent Recognition and its Application to Speech Recognition , 2011 .