Linearly Constrained Minimum Variance for Robust I-vector Based Speaker Recognition

This paper aims at presenting our algorithm used to make submission for the NIST 2013-2014 speaker recognition ivector challenge. The fixed dimensional i-vector representation of speech utterances has attracted attentions from other communities. This challenge focuses on the task of speaker detection using i-vectors derived from conversational telephony speech data. However, the unlabeled i-vectors provided for development purpose make the problem more challenging. The proposed method uses the idea of one of the popular robust beamforming techniques named Linearly Constrained Minimum Variance (LCMV), which has been presented in the context of beamforming for signal enhancement. We will show that LCMV can improve performance by building a model from different i-vectors of a given speaker so as to cancel inter-session variability and increase inter-speaker variability. Imposter covariance matrix modification and score normalization using a selection of imposter speakers have been proposed to improve performance. As measured by minimum decision cost function defined in the challenge, our result is 27% better relative to the baseline system.

[1]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[2]  Lukás Burget,et al.  Discriminatively trained Probabilistic Linear Discriminant Analysis for speaker verification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Ma Xiaochuan Optimum Array Processing Toolbox Based on MATLAB , 2008 .

[4]  Andreas Stolcke,et al.  Within-class covariance normalization for SVM-based speaker recognition , 2006, INTERSPEECH.

[5]  Themos Stafylakis,et al.  PLDA for speaker verification with utterances of arbitrary duration , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  J. Capon High-resolution frequency-wavenumber spectrum analysis , 1969 .

[7]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  James R. Glass,et al.  Cosine Similarity Scoring without Score Normalization Techniques , 2010, Odyssey.

[10]  Dimitris G. Manolakis,et al.  Statistical and Adaptive Signal Processing: Spectral Estimation, Signal Modeling, Adaptive Filtering and Array Processing , 1999 .

[11]  Jian Li,et al.  Doubly constrained robust Capon beamformer , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[12]  Jian Li,et al.  On robust Capon beamforming and diagonal loading , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[13]  Michael Rübsamen,et al.  Robust Adaptive Beamforming Using Multidimensional Covariance Fitting , 2012, IEEE Transactions on Signal Processing.

[14]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Patrick Kenny,et al.  Mixture of PLDA Models in i-vector Space for Gender-Independent Speaker Recognition , 2011, INTERSPEECH.

[16]  Patrick Kenny,et al.  Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification , 2009, INTERSPEECH.