A Multibiometric Speaker Authentication System with SVM Audio Reliability Indicator

Performances of biometric speaker authentication systems are good in clean conditions but their reliability drops severely in noisy environments. Implementation of multibiometric systems using audio and visual experts is one of the solutions to this limitation. In this study, weighting for fusing the audio and visual expert scores is proposed to be adapted corresponding to the current environment. Frequent approach uses fixed weighting but this is inappropriate if the systems are executed in uncertain conditions. In this study, we propose a novel approach by introducing Support Vector Machine (SVM) as indicator system for audio reliability estimation. This approach directly validate the quality of the incoming (claimant) speech signal so as to adaptively change the weighting factor for fusion of both subsystems scores. It is important to priory check the speech signal quality because unreliable speech data give incorrect scores hence affect the accuracy of the total scores of the fusion systems. The effectiveness of this approach has been experimented to a multibiometric authentication system that employs lipreading images as visual features. This system uses SVM as a classifier for both subsystems. Principle Component Analysis (PCA) technique is executed for visual features extraction while for the audio feature extraction; Linear Predictive Coding (LPC) technique has been utilized. In this study, we found that the SVM indicator system is able to determine the quality of the speech signal up to 99.66%. For comparison, EER percentages at 10dB are observed as 51.13% for audio only system, 9.3% for fixed weighting system and 0.27% for adaptive weighting system.

[1]  Roberto Brunelli,et al.  Person identification using multiple cues , 1995, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  X. Zhang,et al.  Automatic speechreading with application to speaker verification , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Douglas A. Reynolds,et al.  An overview of automatic speaker recognition technology , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[5]  Martin Heckmann,et al.  Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition , 2002, EURASIP J. Adv. Signal Process..

[6]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[7]  Salina Abdul Samad,et al.  Score Information Decision Fusion Using Support Vector Machine for a Correlation Filter Based Speaker Authentication System , 2008, CISIS.

[8]  Lawrence Sirovich,et al.  Application of the Karhunen-Loeve Procedure for the Characterization of Human Faces , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Tomaso Poggio,et al.  Automatic person recognition by acoustic and geometric features , 1995 .

[10]  Kuldip K. Paliwal,et al.  Noise compensation in a multi-modal verification system , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[11]  Michael Wagner,et al.  "liveness" Verification in Audio-video Authentication , 2004, INTERSPEECH.

[12]  Chalapathy Neti,et al.  Stream confidence estimation for audio-visual speech recognition , 2000, INTERSPEECH.

[13]  S. Sridharan,et al.  A syntactic approach to automatic lip feature extraction for speaker identification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[14]  Richard B. Reilly,et al.  Robust multi-modal person identification with tolerance of facial expression , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[15]  Juergen Luettin,et al.  Integrating acoustic and labial information for speaker identification and verification , 1997, EUROSPEECH.

[16]  Thomas Wagner,et al.  SESAM: A biometric person identification system using sensor fusion , 1997, Pattern Recognit. Lett..

[17]  Timothy F. Cootes,et al.  Extraction of Visual Features for Lipreading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Bernhard Fröba,et al.  SESAM: A Biometric Person Identification System Using Sensor Fusion , 1997, AVBPA.

[19]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[20]  Chalapathy Neti,et al.  Multistage information fusion for audio-visual speech recognition , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[21]  Jean-Philippe Thiran,et al.  Using entropy as a stream reliability estimate for audio-visual speech recognition , 2008, 2008 16th European Signal Processing Conference.

[22]  Thomas S. Huang,et al.  Fusing audio and visual features of speech , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[23]  Michael Wagner,et al.  Robust face-voice based speaker identity verification using multilevel fusion , 2008, Image Vis. Comput..

[24]  William M. Campbell,et al.  Support vector machines for speaker verification and identification , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[25]  Kuldip K. Paliwal,et al.  Multi-modal person verification system based on face profiles and speech , 1999, ISSPA '99. Proceedings of the Fifth International Symposium on Signal Processing and its Applications (IEEE Cat. No.99EX359).

[26]  S. Gunn Support Vector Machines for Classification and Regression , 1998 .