Improving channel robustness in text-independent speaker verification using adaptive virtual cohort models

In speaker verification, score normalization methods are a common practice to gain better performance and robustness. One kind of score normalization is cohort normalization, which uses information about the score behaviour of known impostors. During enrolment, impostor verifications are simulated to get a speaker-specific set of the most competitive impostors (the cohort). In the present paper, one virtual cohort speaker is synthesized using the most competitive impostor's Hidden Markov Models (HMMs). These impostors are also users of the system and therefore their models have channel-specific information contrary to the universal background model, which provides channel- and speaker-independent models. On verification, cohort scores are obtained by an additional verification of the virtual cohort speaker. The cohort scores evaluate the candidate as an impostor. A cohort normalized score promises greater robustness. This paper will study the effect of the introduced cohort normalization technique on the speaker verification system atip VoxGuard, which is based on mel-frequency cepstral coefficients and HMMs. VoxGuard can be used as either a text-dependent or a text-independent verification system. In this paper, emphasis is placed on text-independent speaker verification. Experiments using the atip speech corpus and the SieTill speech corpus showed improvements measured by the equal error rate on performance and robustness.

[1]  Josef Kittler,et al.  Adaptive client-impostor centric score normalization: A case study in fingerprint verification , 2009, 2009 IEEE 3rd International Conference on Biometrics: Theory, Applications, and Systems.

[2]  Rahim Saeidi,et al.  Advances in front-end and back-end for speaker recognition , 2011 .

[3]  Josef Kittler,et al.  Incorporating Model-Specific Score Distribution in Speaker Verification Systems , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Ulrich Türk,et al.  Compensation Techniques for Network Mismatch in Telephone-Based Speaker Verification (Techniken zur Kompensation der Auswirkungen unterschiedlicher Telefonnetze auf die Sprecher-Verifikation) , 2008 .

[5]  John H. L. Hansen,et al.  A Study on Universal Background Model Training in Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  William M. Campbell,et al.  Towards reduced false-alarms using cohorts , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  W. Marsden I and J , 2012 .

[8]  Stefan-Adrian Toma,et al.  Automatic speaker verification experiments using HMM , 2010, 2010 8th International Conference on Communications.

[9]  Douglas E. Sturim,et al.  Speaker adaptive cohort selection for Tnorm in text-independent speaker verification , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[10]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[11]  Aanchan Mohan Combining speech recognition and speaker verification , 2008 .

[12]  Shantanu Chakrabartty,et al.  An Overview of Statistical Pattern Recognition Techniques for Speaker Verification , 2011, IEEE Circuits and Systems Magazine.

[13]  Tomi Kinnunen,et al.  Real-time speaker identification and verification , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Holger Schalk Biometrische Authentifikation auf Basis von Sprache unter Verwendung stochastischer und signalorientierter Modelle , 2005 .

[15]  Herbert Reininger,et al.  Continuous Speaker Verification in Realtime , 2011, BIOSIG.

[16]  William M. Campbell,et al.  Speaker Verification Using Support Vector Machines and High-Level Features , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Daniel Boies,et al.  T-Norm for text-dependent commercial speaker verification applications: effect of lexical mismatch , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[18]  Jun-ichi Takahashi,et al.  A new cohort normalization using local acoustic information for speaker verification , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[19]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..