On the Improvements of Uni-modal and Bi-modal Fusions of Speaker and Face Recognition for Mobile Biometrics

The MOBIO database provides a challenging test-bed for speaker and face recognition systems because it includes voice and face samples as they would appear in forensic scenarios. In this paper, we investigate uni-modal and bi-modal multi-algorithm fusion using logistic regression. The source speaker and face recognition systems were taken from the 2013 speaker and face recognition evaluations that were held in the context of the last International Conference on Biometrics (ICB-2013). Using the unbiased MOBIO protocols, the employed evaluation measures are the equal error rate (EER), the half-total error rate (HTER) and the detection error trade-off (DET). The results show that by uni-modal algorithm fusion, the HTER's of the speaker recognition system are reduced by around 35%, and of the face recognition system by between 15% and 20%. Bi-modal fusion drastically boosts recognition by a relative gain of 65% - 70% of performance compared to the best uni-modal system.

[1]  Petr Motlícek,et al.  Bi-modal authentication in mobile environments using session variability modelling , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[2]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[3]  Javier Ortega-Garcia,et al.  Likelihood Ratio Calibration in a Transparent and Testable Forensic Speaker Recognition Framework , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[4]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[5]  Pierre Margot,et al.  The contribution of forensic science to crime analysis and investigation: forensic intelligence. , 2006, Forensic science international.

[6]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[7]  Pascal Druyts,et al.  Applying Logistic Regression to the Fusion of the NIST'99 1-Speaker Submissions , 2000, Digit. Signal Process..

[8]  Nicolas Pinto,et al.  Beyond simple features: A large-scale feature search approach to unconstrained face recognition , 2011, Face and Gesture 2011.

[9]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[10]  Norbert Krüger,et al.  Face Recognition by Elastic Bunch Graph Matching , 1997, CAIP.

[11]  Alice J. O'Toole,et al.  Face Recognition Algorithms Surpass Humans Matching Faces Over Changes in Illumination , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Matti Pietikäinen,et al.  Bi-Modal Person Recognition on a Mobile Phone: Using Mobile Phone Data , 2012, 2012 IEEE International Conference on Multimedia and Expo Workshops.

[13]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[14]  Bin Ma,et al.  Approaching human listener accuracy with modern speaker verification , 2010, INTERSPEECH.

[15]  L Sirovich,et al.  Low-dimensional procedure for the characterization of human faces. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[16]  Alice Caplier,et al.  Illumination-robust face recognition using retina modeling , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[17]  Lukás Burget,et al.  Discriminatively trained Probabilistic Linear Discriminant Analysis for speaker verification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Alice Caplier,et al.  Face Recognition with Patterns of Oriented Edge Magnitudes , 2010, ECCV.

[19]  Ammad Ali,et al.  Face Recognition with Local Binary Patterns , 2012 .

[20]  Sridha Sridharan,et al.  Explicit modelling of session variability for speaker verification , 2008, Comput. Speech Lang..

[21]  Patrick Kenny,et al.  Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification , 2009, INTERSPEECH.

[22]  Sébastien Marcel,et al.  The 2013 speaker recognition evaluation in mobile environment , 2013, 2013 International Conference on Biometrics (ICB).

[23]  Timo Ahonen,et al.  Recognition of blurred faces using Local Phase Quantization , 2008, 2008 19th International Conference on Pattern Recognition.

[24]  Thomas P. Minka,et al.  Algorithms for maximum-likelihood logistic regression , 2003 .

[25]  P J. Phillips,et al.  Face Recognition Vendor Test 2000: Evaluation Report , 2001 .

[26]  Sébastien Marcel,et al.  Local binary patterns as an image preprocessing for face authentication , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[27]  Sébastien Marcel,et al.  Parts-Based Face Verification Using Local Frequency Bands , 2009, ICB.

[28]  Chi-Ho Chan,et al.  On the Results of the First Mobile Biometry (MOBIO) Face and Speaker Verification Evaluation , 2010, ICPR Contests.

[29]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Moncef Gabbouj,et al.  The 2013 face recognition evaluation in mobile environment , 2013, 2013 International Conference on Biometrics (ICB).

[31]  David A. van Leeuwen,et al.  Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006 , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  Xiaoyang Tan,et al.  Enhanced Local Texture Feature Sets for Face Recognition Under Difficult Lighting Conditions , 2007, IEEE Transactions on Image Processing.

[33]  William M. Campbell,et al.  Channel compensation for SVM speaker recognition , 2004, Odyssey.

[34]  Sébastien Marcel,et al.  Bob: a free signal processing and machine learning toolbox for researchers , 2012, ACM Multimedia.

[35]  Rolf P. Würtz,et al.  Face Recognition with Disparity Corrected Gabor Phase Differences , 2012, ICANN.

[36]  Sébastien Marcel,et al.  A Scalable Formulation of Probabilistic Linear Discriminant Analysis: Applied to Face Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.