Audio -Visual Biometric Based Speaker Identification

In this paper, we present a multimodal audio-visual speaker identification system. The proposed system decomposes the information existing in a video stream into two components: speech and lip motion. It has been studied that lip information not only presents speech information but also characteristic information about a person's identity. Fusing this information with speech information will produce robust person identification under adverse condition. Gaussian mixture models (GMMs) and Hidden markov models (HMMs) are used throughout this work for the tasks of text dependent speaker recognition and mouth tracking. The performance is evaluated for dataset of 22 Indian of different ethnicity speakers each uttering a sentence. The results show that the performance of the biometric system is significantly better when both audio and video features are used

[1]  Stephen Grossberg,et al.  Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system , 1991, Neural Networks.

[2]  Tsuhan Chen,et al.  Integration strategies for audio-visual speech processing: applied to text-dependent speaker recognition , 2005, IEEE Transactions on Multimedia.

[3]  Filippo Menczer,et al.  Complementing search engines with online web mining agents , 2003, Decis. Support Syst..

[4]  John S. D. Mason,et al.  Speaker recognition models , 1995, EUROSPEECH.

[5]  Arun Ross,et al.  An introduction to biometric recognition , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[7]  Farzin Deravi,et al.  A review of speech-based bimodal recognition , 2002, IEEE Trans. Multim..

[8]  Neal G. Shaw,et al.  A comprehensive agent-based architecture for intelligent information retrieval in a distributed heterogeneous environment , 2002, Decis. Support Syst..

[9]  D. Reynolds,et al.  Authentication gets personal with biometrics , 2004, IEEE Signal Processing Magazine.

[10]  Narendra Ahuja,et al.  Detecting Faces in Images: A Survey , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  J. Oglesby,et al.  Speaker recognition using hidden Markov models, dynamic time warping and vector quantisation , 1995 .

[12]  King Ngi Ngan,et al.  Face segmentation using skin-color map in videophone applications , 1999, IEEE Trans. Circuits Syst. Video Technol..

[13]  Arun Ross,et al.  Information fusion in biometrics , 2003, Pattern Recognit. Lett..

[14]  Shaogang Gong,et al.  Modelling facial colour and identity with Gaussian mixtures , 1998, Pattern Recognit..

[15]  Hervé Glotin,et al.  Weighting schemes for audio-visual fusion in speech recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[16]  S. Sitharama Iyengar,et al.  Adaptive neural network clustering of Web users , 2004, Computer.