Domain adaptation for text dependent speaker verification

Recently we have investigated the use of state-of-the-art textdependent speaker verification algorithms for user authentication and obtained satisfactory results mainly by using a fair amount of text-dependent development data from the target domain. In this work we investigate the ability to build high accuracy text-dependent systems using no data at all from the target domain. Instead of using target domain data, we use resources such as TIMIT, Switchboard, and NIST data. We introduce several techniques addressing both lexical mismatch and channel mismatch. These techniques include synthesizing a universal background model according to lexical content, automatic filtering of irrelevant phonetic content, exploiting information in residual supervectors (usually discarded in the i-vector framework), and inter dataset variability modeling. These techniques reduce verification error significantly, and also improve accuracy when target domain data is available.

[1]  William M. Campbell,et al.  Simple and efficient speaker comparison using approximate KL divergence , 2010, INTERSPEECH.

[2]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[3]  William M. Campbell,et al.  Nuisance Attribute Projection , 2009, Encyclopedia of Biometrics.

[4]  Hagai Aronowitz,et al.  Inter dataset variability compensation for speaker recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Hagai Aronowitz,et al.  New Developments in Voice Biometrics for User Authentication , 2011, INTERSPEECH.

[6]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Hagai Aronowitz Compensating Inter-Dataset Variability in PLDA Hyper-Parameters for Robust Speaker Recognition , 2014, Odyssey.

[8]  Hagai Aronowitz,et al.  Two-wire nuisance attribute projection , 2009, INTERSPEECH.

[9]  Hagai Aronowitz,et al.  Text dependent speaker verification using a small development set , 2012, Odyssey.

[10]  Oren Barkan,et al.  On leveraging conversational data for building a text dependent speaker verification system , 2013, INTERSPEECH.

[11]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[12]  Hagai Aronowitz Speaker recognition using kernel-PCA and intersession variability modeling , 2007, INTERSPEECH.