A Simple Way to Extract I-vector from Normalized Statastics

In the i-vector model, the utterance statistics are extracted from features using universal background model. The utterance is mapped to a vector in the total variability space, which is called i-vector. The total variability space provides a basis to obtain a low dimensional fixed-length representation of a speech utterance. But, the processing is complicated for the interweaving of the statistics and machine learning method. So, we considered separating them and proposed a simple way to extract i-vector by classical principal component analysis, factor analysis and independent component analysis from normalized statistics. The results on NIST 2008 telephone data show that the performance is very close to the traditional method and they can be improved obviously after score fusion.

[1]  Bin Ma,et al.  PLDA Modeling in I-Vector and Supervector Space for Speaker Verification , 2012, INTERSPEECH.

[2]  Tom E. Bishop,et al.  Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[3]  Patrick Kenny,et al.  Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[6]  Shrikanth S. Narayanan,et al.  Speaker verification using simplified and supervised i-vector modeling , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8]  Zbynek Zajíc,et al.  An efficient implementation of Probabilistic Linear Discriminant Analysis , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[10]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[12]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[13]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[14]  Patrick Kenny,et al.  Eigenvoice modeling with sparse training data , 2005, IEEE Transactions on Speech and Audio Processing.

[15]  Lukás Burget,et al.  Simplification and optimization of i-vector extraction , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[17]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .