Complementary combination in i-vector level for language recognition

Recently, i-vector based technology can provide good performance in language recognition (LRE). From the viewpoint of information theory, i-vectors derived from different acoustic features can contain more useful and complementary language information. In this paper, we propose an effective complementary combination for two kinds of i-vectors. One is derived from the commonly used short-term spectral shifted delta cepstral (SDC) and the other from a novel spectro-temporal time-frequency cepstrum (TFC). In order to overcome the curse of dimension and to remove the redundant information in the combined i-vectors, we use principal component analysis (PCA) and linear discriminant analysis (LDA) and evaluate their performances, respectively. For classification, cosine distance scoring (CDS) and support vector machine (SVM) are applied to the new combined i-vectors. The experiments are performed on the NIST LRE 2009 dataset, and the results show that the proposed method can effectively improve the better performance than baseline by EER reducing 1% for 30 s duration and 2.3% for both 10 s and 3 s.

[1]  Patrick Kenny,et al.  Eigenvoice modeling with sparse training data , 2005, IEEE Transactions on Speech and Audio Processing.

[2]  Liang He,et al.  Time–Frequency Cepstral Features and Heteroscedastic Linear Discriminant Analysis for Language Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Douglas A. Reynolds,et al.  Language Recognition via i-vectors and Dimensionality Reduction , 2011, INTERSPEECH.

[4]  William M. Campbell,et al.  Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[5]  Lukás Burget,et al.  Discriminative Training Techniques for Acoustic Language Identification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6]  Lukás Burget,et al.  Language Recognition in iVectors Space , 2011, INTERSPEECH.

[7]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Lukás Burget,et al.  iVector Fusion of Prosodic and Cepstral Features for Speaker Verification , 2011, INTERSPEECH.

[9]  Patrick Kenny,et al.  Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Pietro Laface,et al.  Channel Factors Compensation in Model and Feature Domain for Speaker Recognition , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.