The 2013 speaker recognition evaluation in mobile environment

This paper evaluates the performance of the twelve primary systems submitted to the evaluation on speaker verification in the context of a mobile environment using the MOBIO database. The mobile environment provides a challenging and realistic test-bed for current state-of-the-art speaker verification techniques. Results in terms of equal error rate (EER), half total error rate (HTER) and detection error trade-off (DET) confirm that the best performing systems are based on total variability modeling, and are the fusion of several sub-systems. Nevertheless, the good old UBM-GMM based systems are still competitive. The results also show that the use of additional data for training as well as gender-dependent features can be helpful.

[1]  Sébastien Marcel,et al.  Local binary patterns as an image preprocessing for face authentication , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[2]  David Miller,et al.  The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text , 2004, LREC.

[3]  Alice Caplier,et al.  Illumination-robust face recognition using retina modeling , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[4]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Alice Caplier,et al.  Face Recognition with Patterns of Oriented Edge Magnitudes , 2010, ECCV.

[6]  David A. van Leeuwen,et al.  Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006 , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Xiaoyang Tan,et al.  Enhanced Local Texture Feature Sets for Face Recognition Under Difficult Lighting Conditions , 2007, IEEE Transactions on Image Processing.

[8]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[9]  Lukás Burget,et al.  Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  M. Penagarikano,et al.  Sautrela: a highly modular open source speech recognition framework , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[11]  A. Srinivasan Speech Recognition Using Hidden Markov Model , 2011 .

[12]  Nicolas Pinto,et al.  Beyond simple features: A large-scale feature search approach to unconstrained face recognition , 2011, Face and Gesture 2011.

[13]  Rolf P. Würtz,et al.  Face Recognition with Disparity Corrected Gabor Phase Differences , 2012, ICANN.

[14]  Roman Rosipal,et al.  Overview and Recent Advances in Partial Least Squares , 2005, SLSFS.

[15]  Sébastien Marcel,et al.  The Idiap Speaker Recognition Evaluation System at NIST SRE 2012 , 2012 .

[16]  Yun Fu,et al.  Conformal Embedding Analysis with Local Graph Modeling on the Unit Hypersphere , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Gintaras Barisevi,et al.  TEXT-INDEPENDENT SPEAKER VERIFICATION , 2005 .

[18]  Moncef Gabbouj,et al.  The 2013 face recognition evaluation in mobile environment , 2013, 2013 International Conference on Biometrics (ICB).

[19]  Chi-Ho Chan,et al.  Multispectral Local Binary Pattern Histogram for Component-based Color Face Verification , 2007, 2007 First IEEE International Conference on Biometrics: Theory, Applications, and Systems.

[20]  Carmen García Mateo,et al.  Quality-Based Score Normalization and Frame Selection for Video-Based Person Authentication , 2008 .

[21]  Lukás Burget,et al.  Comparison of scoring methods used in speaker recognition with Joint Factor Analysis , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Norbert Krüger,et al.  Face recognition by elastic bunch graph matching , 1997, Proceedings of International Conference on Image Processing.

[23]  Patrick J. Flynn,et al.  Preliminary Face Recognition Grand Challenge Results , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[24]  William M. Campbell,et al.  Advances in channel compensation for SVM speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[25]  Rama Chellappa,et al.  Discriminant analysis of principal components for face recognition , 1998 .

[26]  Josef Kittler,et al.  A discriminative parametric approach to video-based score-level fusion for biometric authentication , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[27]  Alice J. O'Toole,et al.  Face Recognition Algorithms Surpass Humans Matching Faces Over Changes in Illumination , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Matti Pietikäinen,et al.  Bi-Modal Person Recognition on a Mobile Phone: Using Mobile Phone Data , 2012, 2012 IEEE International Conference on Multimedia and Expo Workshops.

[29]  Pod Hyb Extended Set of Local Binary Patterns for Rapid Object Detection , 2010 .

[30]  Matti Pietikäinen,et al.  Face Recognition with Local Binary Patterns , 2004, ECCV.

[31]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[32]  María Victoria Rodellar Biarge,et al.  A hybrid parameterization technique for Speaker Identification , 2008, 2008 16th European Signal Processing Conference.

[33]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[34]  M. Mak,et al.  Robust Voice Activity Detection for Interview Speech in NIST Speaker Recognition Evaluation , 2010 .

[35]  Sébastien Marcel,et al.  Inter-session variability modelling and joint factor analysis for face authentication , 2011, 2011 International Joint Conference on Biometrics (IJCB).

[36]  Wen Gao,et al.  Local Gabor binary pattern histogram sequence (LGBPHS): a novel non-statistical model for face representation and recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[37]  L Sirovich,et al.  Low-dimensional procedure for the characterization of human faces. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[38]  Sébastien Marcel,et al.  Bob: a free signal processing and machine learning toolbox for researchers , 2012, ACM Multimedia.

[39]  Chi-Ho Chan,et al.  On the Results of the First Mobile Biometry (MOBIO) Face and Speaker Verification Evaluation , 2010, ICPR Contests.

[40]  Timo Ahonen,et al.  Recognition of blurred faces using Local Phase Quantization , 2008, 2008 19th International Conference on Pattern Recognition.

[41]  Lukás Burget,et al.  Simplification and optimization of i-vector extraction , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).