Foreign accent detection from spoken Finnish using i-vectors

I-vector based recognition is a well-established technique in state-of-the-art speaker and language recognition but its use in dialect and accent classification has received less attention. We represent an experimental study of i-vector based dialect classification, with a special focus on foreign accent detection from spoken Finnish. Using the CallFriend corpus, we first study how recognition accuracy is affected by the choices of various i-vector system parameters, such as the number of Gaussians, i-vector dimensionality and reduction method. We then apply the same methods on the Finnish national foreign language certificate (FSD) corpus and compare the results to traditional Gaussian mixture model - universal background model (GMM-UBM) recognizer. The results, in terms of equal error rate, indicate that i-vectors outperform GMM-UBM as one expects. We also notice that in foreign accent detection, 7 out of 9 accents were more accurately detected by Gaussian scoring than by cosine scoring. Index Terms: Dialect recognition, foreign accent recognition, i-vector, GMM-UBM, Finnish language

[1]  Lukás Burget,et al.  BUT language recognition system for NIST 2007 evaluations , 2008, INTERSPEECH.

[2]  Douglas A. Reynolds,et al.  Dialect identification using Gaussian mixture models , 2004, Odyssey.

[3]  John H. L. Hansen,et al.  A systematic strategy for robust automatic dialect identification , 2011, 2011 19th European Signal Processing Conference.

[4]  Douglas A. Reynolds,et al.  Language Recognition via i-vectors and Dimensionality Reduction , 2011, INTERSPEECH.

[5]  Douglas E. Sturim,et al.  The MITLL NIST LRE 2009 language recognition system , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  M. A. Kohler,et al.  Language identification using shifted delta cepstra , 2002, The 2002 45th Midwest Symposium on Circuits and Systems, 2002. MWSCAS-2002..

[7]  John Nerbonne,et al.  Linguistic Variation and Computation (Invited talk) , 2003, EACL.

[8]  Douglas E. Sturim,et al.  The MITLL NIST LRE 2015 Language Recognition System , 2016, Odyssey.

[9]  Sridha Sridharan,et al.  i-vector Based Speaker Recognition on Short Utterances , 2011, INTERSPEECH.

[10]  Marc A. Zissman,et al.  Automatic dialect identification of extemporaneous conversational, Latin American Spanish speech , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[11]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[12]  Robert P. W. Duin,et al.  Linear dimensionality reduction via a heteroscedastic extension of LDA: the Chernoff criterion , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Douglas A. Reynolds,et al.  A study of computation speed-UPS of the GMM-UBM speaker recognition system , 1999, EUROSPEECH.

[14]  Joseph P. Campbell,et al.  A linguistically-informative approach to dialect recognition using dialect-discriminating context-dependent phonetic models , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Patrick Kenny,et al.  Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification , 2009, INTERSPEECH.

[16]  Stephen J. Cox,et al.  Iterative classification of regional British accents in i-vector space , 2012, MLSLP.

[17]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Dirk Van Compernolle,et al.  Feature subset selection for improved native accent identification , 2010, Speech Commun..

[19]  Bin Ma,et al.  Spoken Language Recognition: From Fundamentals to Practice , 2013, Proceedings of the IEEE.

[20]  Man-Wai Mak,et al.  Alleviating the small sample-size problem in i-vector based speaker verification , 2012, 2012 8th International Symposium on Chinese Spoken Language Processing.

[21]  Li Lee,et al.  Speaker normalization using efficient frequency warping procedures , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[22]  Mark J. F. Gales,et al.  Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..

[23]  Lukás Burget,et al.  Language Recognition in iVectors Space , 2011, INTERSPEECH.

[24]  Julia Hirschberg,et al.  Automatic Dialect and Accent Recognition and its Application to Speech Recognition , 2011 .