Speaker recognition using common passphrases in RedDots

In this paper we report our work on the recently collected text dependent speaker recognition dataset named RedDots, with a focus on the common passphrase condition. We first investigate an out-of-the-box approach. We then report several strategies to train on RedDots itself using up to 40 speakers for training. The GMM-NAP framework is used as a baseline. We report the following novelties: First, we demonstrate the use of bagging for improved accuracy. Second, we estimate the EER of a passphrase using metadata only. Third, the estimated EERs are used for improved score normalization. Finally we report an analysis of system sensitivity to the duration between enrollment and testing (template aging).

[1]  Hagai Aronowitz,et al.  Speaker recognition in two-wire test sessions , 2008, INTERSPEECH.

[2]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[3]  Sharath Pankanti,et al.  Multi-modal biometrics for mobile authentication , 2014, IEEE International Joint Conference on Biometrics.

[4]  Hagai Aronowitz,et al.  Audio enhancing with DNN autoencoder for speaker recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Hagai Aronowitz,et al.  New Developments in Voice Biometrics for User Authentication , 2011, INTERSPEECH.

[6]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[7]  Hagai Aronowitz Exploiting supervector structure for speaker recognition trained on a small development set , 2015, INTERSPEECH.

[8]  Bin Ma,et al.  The reddots data collection for speaker recognition , 2015, INTERSPEECH.

[9]  Hagai Aronowitz,et al.  Domain adaptation for text dependent speaker verification , 2014, INTERSPEECH.

[10]  Hagai Aronowitz Score stabilization for speaker recognition trained on a small development set , 2015, INTERSPEECH.

[11]  Hagai Aronowitz,et al.  Text-Dependent Audiovisual Synchrony Detection for Spoofing Detection in Mobile Person Recognition , 2016, INTERSPEECH.

[12]  Hagai Aronowitz Speaker recognition using matched filters , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Richard M. Stern,et al.  Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[14]  William M. Campbell,et al.  Simple and efficient speaker comparison using approximate KL divergence , 2010, INTERSPEECH.

[15]  Hagai Aronowitz,et al.  Efficient score normalization for speaker recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.