Multi-modal person veri cation tools using speech and images

We propose multi-modal person veri cation using voice and images as a solution to the secured access problem. The necessary i/o devices are now standard, cheaply available and, most importantly, constitute the two most important human communication modalities. The visual part currently involves i) matching of a coarse grid containing Gabor phase information from face images, ii) facial feature localization and extraction iii) 3D biometrical feature extraction by structured light. The acoustic part uses three methods (DTW,SOSM and HMM) to compare voice references extracted from the speech signal. In the acoustic part LPC coe cients are extracted and three di erent classi ers are used in parallel. The global decision is taken by applying a Furui threshold to the individual methods and in combining the individual results according to a majority law.

[1]  Ray A. Jarvis,et al.  A Perspective on Range Finding Techniques for Computer Vision , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Ioannis Pitas,et al.  Nonlinear Digital Filters - Principles and Applications , 1990, The Springer International Series in Engineering and Computer Science.

[3]  P. Thevenaz Reconnaissance de locuteurs indépendante du texte , 1990 .

[4]  Heinrich Niemann Pattern Analysis and Understanding , 1990 .

[5]  J. Bigun A structure feature for some image processing applications based on spiral functions , 1990 .

[6]  Gaile G. Gordon,et al.  Face recognition based on depth maps and surface curvature , 1991, Optics & Photonics.

[7]  Johan Wiklund,et al.  Multidimensional orientation : texture analysis and optical flow , 1991 .

[8]  Aaron E. Rosenberg,et al.  Connected word talker verification using whole word hidden Markov models , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[9]  Douglas A. Reynolds,et al.  A Gaussian mixture modeling approach to text-independent speaker identification , 1992 .

[10]  Joachim M. Buhmann,et al.  Distortion Invariant Object Recognition in the Dynamic Link Architecture , 1993, IEEE Trans. Computers.

[11]  Belur V. Dasarathy,et al.  Decision fusion , 1994 .

[12]  Rolf P. Würtz,et al.  Multilayer dynamic link networks for establishing image point correspondences and visual object recognition , 1995 .

[13]  Roberto Brunelli,et al.  Person identification using multiple cues , 1995, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Alexandros Eleftheriadis,et al.  Automatic face location detection and tracking for model-assisted coding of video teleconferencing sequences at low bit-rates , 1995, Signal Process. Image Commun..

[15]  A. Zakhor,et al.  Depth based recovery of human facial features from video sequences , 1995, Proceedings., International Conference on Image Processing.

[16]  F. Bimbot,et al.  Second-order statistical measures for text-independent speaker identification , 1995, Speech Commun..

[17]  Haiyuan Wu An application of fuzzy theory: Face detection , 1995 .

[18]  Ying Dai,et al.  Extraction of Facial Images from the Complex Background Using Color Information and SGLD Matrices , 1996 .