Residual Factor Analysis for Text-Independent Speaker Verification

Joint Factor Analysis (JFA) has become the state-of-the-art technique in the problem of speaker verification (1, 2). At the same time, the training of eigenvoice matrix seems to be a heavy burden to us, because it requires lots of multi-channel data, which largely determines the performance of the system. In this paper, we first try to exploit an upper bound performance of the JFA system in a non-normal way, and then proposed a new technique which we referred as Residual Factor Analysis (RFA), in which we replace the eigenvoice matrix in JFA system with the residual vector, to remove the heavy burden of training eigenvoice matrix. We tested the proposed technique on the core condition of NIST 2006 speaker recognition evaluation (SRE 06) and obtained equivalent results to JFA system (equal error rate of about 3.99%)(1), while our method requires no extra multi-channel data except some for training eigenchannel matrix.

[1]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[2]  Patrick Kenny,et al.  A Joint Factor Analysis Approach to Progressive Model Adaptation in Text-Independent Speaker Verification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Lukás Burget,et al.  Comparison of scoring methods used in speaker recognition with Joint Factor Analysis , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Patrick Kenny,et al.  Disentangling speaker and channel effects in speaker verification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Patrick Kenny,et al.  The Geometry of the Channel Space in GMM-Based Speaker Recognition , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[6]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[7]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[8]  Patrick Kenny,et al.  Factor analysis simplified [speaker verification applications] , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[9]  Patrick Kenny,et al.  Development of the primary CRIM system for the NIST 2008 speaker recognition evaluation , 2008, INTERSPEECH.

[10]  Patrick Kenny,et al.  Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Patrick Kenny,et al.  Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms , 2006 .

[12]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Sridha Sridharan,et al.  Experiments in Session Variability Modelling for Speaker Verification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[14]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..