Exploiting i-vector posterior covariances for short-duration language recognition

Linear models in i-vector space have shown to be an effective solution not only for speaker identification, but also for language recogniton. The i-vector extraction process, however, is affected by several factors, such as noise level, the acoustic content of the utterance and the duration of the spoken segments. These factors influence both the i-vector estimate and its uncertainty, represented by the i-vector posterior covariance matrix. Modeling of i-vector uncertainty with Probabilistic Linear Discriminant Analysis has shown to be effective for short-duration speaker identification. This paper extends the approach to language recognition, analyzing the effects of i-vector covariances on a state-of-the-art Gaussian classifier, and proposes an effective solution for the reduction of the average detection cost (Cavg) for short segments.

[1]  Martin Karafiát,et al.  Further investigation into multilingual training and adaptation of stacked bottle-neck neural network structure , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[2]  Pietro Laface,et al.  On the use of a multilingual neural network front-end , 2008, INTERSPEECH.

[3]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Lukás Burget,et al.  Data selection and calibration issues in automatic language recognition - investigation with BUT-AGNITIO NIST LRE 2009 system , 2010, Odyssey.

[5]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[6]  Pavel Matejka,et al.  Description and analysis of the Brno276 system for LRE2011 , 2012, Odyssey.

[7]  Jan Cernocký,et al.  BUT 2014 Babel system: analysis of adaptation in NN based systems , 2014, INTERSPEECH.

[8]  N. Brummer,et al.  On calibration of language recognition scores , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[9]  Martin Karafiát,et al.  The language-independent bottleneck features , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[10]  Patrick Kenny,et al.  Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms , 2006 .

[11]  Bengt J. Borgstrom,et al.  Supervector Bayesian speaker comparison , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Pavel Matejka,et al.  Multilingual bottleneck features for language recognition , 2015, INTERSPEECH.

[13]  Ruhi Sarikaya,et al.  Bottleneck features for speaker recognition , 2012, Odyssey.

[14]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[15]  Themos Stafylakis,et al.  PLDA for speaker verification with utterances of arbitrary duration , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Lukás Burget,et al.  Language Recognition in iVectors Space , 2011, INTERSPEECH.

[17]  Lirong Dai,et al.  Deep Bottleneck Features for Spoken Language Identification , 2014, PloS one.

[18]  Douglas A. Reynolds,et al.  Language Recognition via i-vectors and Dimensionality Reduction , 2011, INTERSPEECH.

[19]  Pietro Laface,et al.  On the use of i–vector posterior distributions in Probabilistic Linear Discriminant Analysis , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[20]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[21]  Pietro Laface,et al.  Probabilistic linear discriminant analysis of i-vector posterior distributions , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Sri Harish Reddy Mallidi,et al.  Neural Network Bottleneck Features for Language Identification , 2014, Odyssey.

[23]  Alan McCree,et al.  Multiclass Discriminative Training of i-vector Language Recognition , 2014, Odyssey.