Speaker Verification in Different Database Scenarios

This document shows the results of our Speaker Verification System under two scenarios: the Face and Speaker Verification Evaluation organized by MOBIO (MObile BIOmetric consortium) and the results for the Speaker Recognition Evaluation 2010 organized by NIST. The core of our system is based on a Gaussian Mixture Model (GMM) and maximum likelihood (ML) framework. First, it extracts the important speech features by computing the Mel Frequency Cepstral Coefficients (MFCC). Then, the MFCCs train genderdependent GMMs that are later adapted to obtain target models. To obtain reliable performance statistics those target-models evaluate a set of trials and final scores are calculated. Finally, those scores are tagged as target or impostor. We tried several system configurations and found that each database requires a specific tuning to improve the performance. For the MOBIO database we obtained an average equal error rate (EER) of 16.43 %. For the NIST 2010 database we accomplished an average EER of 16.61%. NIST2010 database considers various conditions. From those conditions, the interview training and testing conditions showed the best EER of 10.94 %, followed by the phone call training phone call testing conditions of 13.35%.

[1]  Olli Viikki,et al.  Cepstral domain segmental feature vector normalization for noise robust speech recognition , 1998, Speech Commun..

[2]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[3]  Samy Bengio,et al.  A unified framework for score normalization techniques applied to text-independent speaker verification , 2005, IEEE Signal Processing Letters.

[4]  Geoffroy Querol Speaker recognition evaluation: selective approaches and fusion , 2007 .

[5]  Hynek Hermansky,et al.  RASTA-PLP speech analysis technique , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[7]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[8]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[9]  Douglas A. Reynolds,et al.  A Gaussian mixture modeling approach to text-independent speaker identification , 1992 .

[10]  Chi-Ho Chan,et al.  On the Results of the First Mobile Biometry (MOBIO) Face and Speaker Verification Evaluation , 2010, ICPR Contests.

[11]  Alvin F. Martin,et al.  NIST 2008 speaker recognition evaluation: performance across telephone and room microphone channels , 2009, INTERSPEECH.

[12]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[13]  Gérard Chollet,et al.  Text-Independent Speaker Verification: State of the Art and Challenges , 2005, WNSP.

[14]  Jirí Navrátil,et al.  The awe and mystery of t-norm , 2003, INTERSPEECH.

[15]  Lukás Burget,et al.  BUT system for NIST 2008 speaker recognition evaluation , 2009, INTERSPEECH.

[16]  J. Andel Sequential Analysis , 2022, The SAGE Encyclopedia of Research Design.

[17]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[18]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[19]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[20]  B. P. Bogert,et al.  The quefrency analysis of time series for echoes : cepstrum, pseudo-autocovariance, cross-cepstrum and saphe cracking , 1963 .

[21]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[22]  Sébastien Marcel,et al.  MOBIO Database for the ICPR 2010 Face and Speech Competition , 2009 .