MULTIMODAL PERSON IDENTIFICATION IN A SMART ROOM

In this paper we present a person identification system based on a combination of acoustic features and 2D face images. We address the modality integration issue on the example of a smart room environment. In order to improve the results of the individual modalities, the audio and video classifiers are integrated after a set of normalization and fusion techniques. First we introduce the monomodal acoustic and video identification approaches and then we present the use of combined input speech and face images for person identification. The various sensory modalities, speech and faces, are processed both individually and jointly. The result obtained in the CLEAR’06 Evaluation Campaign shows that the performance of the multimodal approach results in improved performance in the identification of the participants.

[1]  Jiri Matas,et al.  XM2VTSDB: The Extended M2VTS Database , 1999 .

[2]  Alan Mink,et al.  Multimodal Biometric Authentication Methods: A COTS Approach | NIST , 2003 .

[3]  Lawrence Sirovich,et al.  Application of the Karhunen-Loeve Procedure for the Characterization of Human Faces , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[5]  Tieniu Tan,et al.  Combining Fingerprint and Voiceprint Biometrics for Identity Verification: an Experimental Comparison , 2004, ICBA.

[6]  Sharath Pankanti,et al.  Guide to Biometrics , 2003, Springer Professional Computing.

[7]  Simon King,et al.  V Jornadas en Tecnologia del Habla , 2008 .

[8]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[9]  Ralph Gross,et al.  Person identification using automatic integration of speech, lip, and face experts , 2003, WBMA '03.

[10]  U. Uludag,et al.  Multimodal Biometric Authentication Methods : A COTS Approach , 2003 .

[11]  Climent Nadeu,et al.  Time and frequency filtering of filter-bank energies for robust HMM speech recognition , 2000, Speech Commun..

[12]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[13]  Verónica Vilaplana,et al.  Face Recognition using Groups of Images in Smart Room Scenarios , 2006, 2006 International Conference on Image Processing.

[14]  Roberto Brunelli,et al.  Person identification using multiple cues , 1995, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Tsuhan Chen,et al.  Improved Audio-Visual Speaker Recognition via the Use of a Hybrid Combination Strategy , 2003, AVBPA.