Incorporating uncertainty as a Quality Measure in I-Vector Based Language Recognition

State-of-the-art language recognition systems involve modeling utterances with the i-vectors. However, the uncertainty of the i-vector extraction process represented by the i-vector posterior covariance is affected by various factors such as channel mismatch, background noise, incomplete transformations and duration variability. In this paper, we propose a new quality measure based on the i-vector posterior covariance and incorporate it into the recognition process to improve the recognition accuracy. The experimental results with LRE15 database and various duration conditions show a 2.9% relative improvement in terms of average performance cost as a result of incorporating the proposed quality measure in language recognition systems.

[1]  Florin Curelaru,et al.  Front-End Factor Analysis For Speaker Verification , 2018, 2018 International Conference on Communications (COMM).

[2]  Ramón Fernández Astudillo,et al.  Uncertain LDA: Including Observation Uncertainties in Discriminative Transforms , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Tomi Kinnunen,et al.  i-Vector Modeling of Speech Attributes for Automatic Foreign Accent Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[4]  K. Strimmer,et al.  Optimal Whitening and Decorrelation , 2015, 1512.00809.

[5]  Paavo Alku,et al.  Accounting for uncertainty of i-vectors in speaker recognition using uncertainty propagation and modified imputation , 2015, INTERSPEECH.

[6]  Sandro Cumani,et al.  Exploiting i-vector posterior covariances for short-duration language recognition , 2015, INTERSPEECH.

[7]  Mohamad Hasan Bahari,et al.  Multitask speaker profiling for estimating age, height, weight and smoking habits from spontaneous telephone speech signals , 2014, 2014 4th International Conference on Computer and Knowledge Engineering (ICCKE).

[8]  Simon Dobrisek,et al.  Incorporating Duration Information into I-Vector-Based Speaker Recognition Systems , 2014, Odyssey.

[9]  John H. L. Hansen,et al.  Uncertainty propagation in front end factor analysis for noise robust speaker recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Pietro Laface,et al.  On the use of i–vector posterior distributions in Probabilistic Linear Discriminant Analysis , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  Srinivasan Umesh,et al.  Improved cepstral mean and variance normalization using Bayesian framework , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[12]  David A. van Leeuwen,et al.  Quality Measure Functions for Calibration of Speaker Recognition Systems in Various Duration Conditions , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Themos Stafylakis,et al.  PLDA for speaker verification with utterances of arbitrary duration , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Hugo Van hamme,et al.  Accent recognition using i-vector, Gaussian Mean Supervector and Gaussian posterior probability supervector for spontaneous telephone speech , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Bin Ma,et al.  Spoken Language Recognition: From Fundamentals to Practice , 2013, Proceedings of the IEEE.

[16]  Themos Stafylakis,et al.  Text-dependent speaker recognition using PLDA with uncertainty propagation , 2013, INTERSPEECH.

[17]  Peter L. Patrick Language Analysis For Determination Of Origin: Objective Evidence For Refugee Status Determination , 2012 .

[18]  Lukás Burget,et al.  Language Recognition in iVectors Space , 2011, INTERSPEECH.

[19]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[20]  Douglas A. Reynolds,et al.  Language Recognition via i-vectors and Dimensionality Reduction , 2011, INTERSPEECH.

[21]  Julia Hirschberg,et al.  Automatic Dialect and Accent Recognition and its Application to Speech Recognition , 2011 .

[22]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[23]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[25]  DeLiang Wang,et al.  Incorporating Auditory Feature Uncertainties in Robust Speaker Identification , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[26]  Julian Fiérrez,et al.  Using quality measures for multilevel speaker recognition , 2006, Comput. Speech Lang..

[27]  N. Brummer,et al.  On calibration of language recognition scores , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[28]  Richard M. Stern,et al.  Reconstruction of missing features for robust speech recognition , 2004, Speech Commun..

[29]  Alvin F. Martin,et al.  NIST 2003 language recognition evaluation , 2003, INTERSPEECH.

[30]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[31]  A. Waibel,et al.  Multilinguality in speech and spoken language systems , 2000, Proceedings of the IEEE.

[32]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[33]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .