Multi-feature combination for speaker recognition

Combination of different features has been proved to be a good method for improving performance in speech recognition. In speaker recognition (SRE), various features have also been developed to reflect complementary aspects of speaker's characteristics. This paper proposed an effective multi-feature combination in speaker recognition. In order to avoid the “dimensionality disaster” and to delimit the redundant information, linear discriminant analysis (LDA) is used to reduce the high dimensionality of combined feature to be lower. Then feature-domain channel compensation is applied to improve the performance. In experiments, we use the popular short-term spectral Mel-frequency cepstral coefficients (MFCC) and novel spectro-temporal time-frequency cepstrum (TFC) to do feature combination followed by LDA and feature-domain latent factor analysis (fLFA) for channel compensation respectively. The experimental results on the NIST SRE2008 short2 telephone-short3 telephone test set show that the proposed multi-feature combination is an effective method to outperform both raw features.

[1]  Elizabeth Shriberg,et al.  A comparison of approaches for modeling prosodic features in speaker recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[3]  Juan Carlos,et al.  Review of "Discrete-Time Speech Signal Processing - Principles and Practice", by Thomas Quatieri, Prentice-Hall, 2001 , 2003 .

[4]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[5]  S. Vaseghi,et al.  Speech modelling using cepstral-time feature matrices in hidden Markov models , 1993 .

[6]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[7]  Andreas Stolcke,et al.  MLLR transforms as features in speaker recognition , 2005, INTERSPEECH.

[8]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[9]  Liang He,et al.  Time–Frequency Cepstral Features and Heteroscedastic Linear Discriminant Analysis for Language Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Patrick Kenny,et al.  Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Liang He,et al.  Variant time-frequency cepstral features for speaker recognition , 2010, INTERSPEECH.

[12]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[13]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[14]  Chungyong Lee,et al.  An information-theoretic perspective on feature selection in speaker recognition , 2005, IEEE Signal Processing Letters.

[15]  Pietro Laface,et al.  Channel Factors Compensation in Model and Feature Domain for Speaker Recognition , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.