Effectiveness in open-set speaker identification

This paper presents investigations into the relative effectiveness of two alternative approaches to open-set text-independent speaker identification (OSTI-SI). The methods considered are the recently introduced i-vector and the more traditional GMM-UBM method supported by score normalisation. The study is motivated by the growing need for effective extraction of intelligence and evidence from audio recordings in the fight against crime. OSTI-SI is known to be the most challenging subclass of speaker recognition, and its adoption in criminal investigation applications is further complicated by undesired variations in speech characteristics due to changing levels of environmental noise. In this study, the experimental investigations are conducted using a protocol developed for the identification task, based on the NIST speaker recognition evaluation corpus of 2008. In order to closely cover relevant conditions in the considered application areas and investigate the identification performance in such scenarios, the speech data is contaminated with a range of real-world noise. The paper provides a detailed description of the experimental study and presents a thorough analysis of the results.

[1]  I. Ntroduction The NIST Year 2005 Speaker Recognition Evaluation Plan 1 , .

[2]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[4]  Aladdin M. Ariyaeeinia,et al.  Verification effectiveness in open-set speaker identification , 2006 .

[5]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[6]  Aladdin M. Ariyaeeinia,et al.  Performance Evaluation in Open-Set Speaker Identification , 2011, BIOID.

[7]  Patrick Kenny,et al.  Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification , 2009, INTERSPEECH.

[8]  Mitchell McLaren,et al.  Weighted LDA techniques for i-vector based speaker verification , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[10]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[11]  Aladdin M. Ariyaeeinia,et al.  Relative effectiveness of score normalisation methods in open-set speaker identification , 2004, Odyssey.

[12]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[13]  Xavier Anguera Miró ROBUST SPEAKER DIARIZATION FOR MEETINGS , 2006 .

[14]  Aladdin M. Ariyaeeinia,et al.  Effective speaker verification via dynamic mismatch compensation , 2012, IET Biom..