论文信息 - The 2013 speaker recognition evaluation in mobile environment

The 2013 speaker recognition evaluation in mobile environment

This paper evaluates the performance of the twelve primary systems submitted to the evaluation on speaker verification in the context of a mobile environment using the MOBIO database. The mobile environment provides a challenging and realistic test-bed for current state-of-the-art speaker verification techniques. Results in terms of equal error rate (EER), half total error rate (HTER) and detection error trade-off (DET) confirm that the best performing systems are based on total variability modeling, and are the fusion of several sub-systems. Nevertheless, the good old UBM-GMM based systems are still competitive. The results also show that the use of additional data for training as well as gender-dependent features can be helpful.

[1] Sébastien Marcel,et al. Local binary patterns as an image preprocessing for face authentication , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[2] David Miller,et al. The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text , 2004, LREC.

[3] Alice Caplier,et al. Illumination-robust face recognition using retina modeling , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[4] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[5] Alice Caplier,et al. Face Recognition with Patterns of Oriented Edge Magnitudes , 2010, ECCV.

[6] David A. van Leeuwen,et al. Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006 , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7] Xiaoyang Tan,et al. Enhanced Local Texture Feature Sets for Face Recognition Under Difficult Lighting Conditions , 2007, IEEE Transactions on Image Processing.

[8] Rainer Martin,et al. Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[9] Lukás Burget,et al. Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10] M. Penagarikano,et al. Sautrela: a highly modular open source speech recognition framework , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[11] A. Srinivasan. Speech Recognition Using Hidden Markov Model , 2011 .

[12] Nicolas Pinto,et al. Beyond simple features: A large-scale feature search approach to unconstrained face recognition , 2011, Face and Gesture 2011.

[13] Rolf P. Würtz,et al. Face Recognition with Disparity Corrected Gabor Phase Differences , 2012, ICANN.

[14] Roman Rosipal,et al. Overview and Recent Advances in Partial Least Squares , 2005, SLSFS.

[15] Sébastien Marcel,et al. The Idiap Speaker Recognition Evaluation System at NIST SRE 2012 , 2012 .

[16] Yun Fu,et al. Conformal Embedding Analysis with Local Graph Modeling on the Unit Hypersphere , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[17] Gintaras Barisevi,et al. TEXT-INDEPENDENT SPEAKER VERIFICATION , 2005 .

[18] Moncef Gabbouj,et al. The 2013 face recognition evaluation in mobile environment , 2013, 2013 International Conference on Biometrics (ICB).

[19] Chi-Ho Chan,et al. Multispectral Local Binary Pattern Histogram for Component-based Color Face Verification , 2007, 2007 First IEEE International Conference on Biometrics: Theory, Applications, and Systems.

[20] Carmen García Mateo,et al. Quality-Based Score Normalization and Frame Selection for Video-Based Person Authentication , 2008 .

[21] Lukás Burget,et al. Comparison of scoring methods used in speaker recognition with Joint Factor Analysis , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22] Norbert Krüger,et al. Face recognition by elastic bunch graph matching , 1997, Proceedings of International Conference on Image Processing.

[23] Patrick J. Flynn,et al. Preliminary Face Recognition Grand Challenge Results , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[24] William M. Campbell,et al. Advances in channel compensation for SVM speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[25] Rama Chellappa,et al. Discriminant analysis of principal components for face recognition , 1998 .

[26] Josef Kittler,et al. A discriminative parametric approach to video-based score-level fusion for biometric authentication , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[27] Alice J. O'Toole,et al. Face Recognition Algorithms Surpass Humans Matching Faces Over Changes in Illumination , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28] Matti Pietikäinen,et al. Bi-Modal Person Recognition on a Mobile Phone: Using Mobile Phone Data , 2012, 2012 IEEE International Conference on Multimedia and Expo Workshops.

[29] Pod Hyb. Extended Set of Local Binary Patterns for Rapid Object Detection , 2010 .

[30] Matti Pietikäinen,et al. Face Recognition with Local Binary Patterns , 2004, ECCV.

[31] Israel Cohen,et al. Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[32] María Victoria Rodellar Biarge,et al. A hybrid parameterization technique for Speaker Identification , 2008, 2008 16th European Signal Processing Conference.

[33] Jr. J.P. Campbell,et al. Speaker recognition: a tutorial , 1997, Proc. IEEE.

[34] M. Mak,et al. Robust Voice Activity Detection for Interview Speech in NIST Speaker Recognition Evaluation , 2010 .

[35] Sébastien Marcel,et al. Inter-session variability modelling and joint factor analysis for face authentication , 2011, 2011 International Joint Conference on Biometrics (IJCB).

[36] Wen Gao,et al. Local Gabor binary pattern histogram sequence (LGBPHS): a novel non-statistical model for face representation and recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[37] L Sirovich,et al. Low-dimensional procedure for the characterization of human faces. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[38] Sébastien Marcel,et al. Bob: a free signal processing and machine learning toolbox for researchers , 2012, ACM Multimedia.

[39] Chi-Ho Chan,et al. On the Results of the First Mobile Biometry (MOBIO) Face and Speaker Verification Evaluation , 2010, ICPR Contests.

[40] Timo Ahonen,et al. Recognition of blurred faces using Local Phase Quantization , 2008, 2008 19th International Conference on Pattern Recognition.

[41] Lukás Burget,et al. Simplification and optimization of i-vector extraction , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).