Audio-visual imposture

A GMM based audio visual speaker verification system is described and an Active Appearance Model with a linear speaker transformation system is used to evaluate the robustness of the verification. An Active Appearance Model (AAM) is used to automatically locate and track a speaker's face in a video recording. A Gaussian Mixture Model (GMM) based classifier (BECARS) is used for face verification. GMM training and testing is accomplished on DCT based extracted features of the detected faces. On the audio side, speech features are extracted and used for speaker verification with the GMM based classifier. Fusion of both audio and video modalities for audio visual speaker verification is compared with face verification and speaker verification systems. To improve the robustness of the multimodal biometric identity verification system, an audio visual imposture system is envisioned. It consists of an automatic voice transformation technique that an impostor may use to assume the identity of an authorized client. Features of the transformed voice are then combined with the corresponding appearance features and fed into the GMM based system BECARS for training. An attempt is made to increase the acceptance rate of the impostor and to analyzing the robustness of the verification system. Experiments are being conducted on the BANCA database, with a prospect of experimenting on the newly developed PDAtabase developed within the scope of the SecurePhone project.

[1]  Tzong-Jer Yang,et al.  Speech Driven Facial Animation , 1999, Computer Animation and Simulation.

[2]  Ashok Samal,et al.  Automatic recognition and analysis of human faces and facial expressions: a survey , 1992, Pattern Recognit..

[3]  Franck Davoine,et al.  Facial expression recognition and synthesis based on an appearance model , 2004, Signal Process. Image Commun..

[4]  Gérard Chollet,et al.  Voice forgery using ALISP: indexation in a client memory , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[5]  Chafic Mokbel,et al.  BECARS: a free software for speaker verification , 2004, Odyssey.

[6]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[7]  Julian Fiérrez,et al.  A Comparative Evaluation of Fusion Strategies for Multimodal Biometric Verification , 2003, AVBPA.

[8]  Ah Chung Tsoi,et al.  Face recognition: a convolutional neural-network approach , 1997, IEEE Trans. Neural Networks.

[9]  Chengjun Liu,et al.  Evolutionary Pursuit and Its Application to Face Recognition , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Gerhard Rigoll,et al.  Recognition of JPEG compressed face images based on statistical methods , 2000, Image Vis. Comput..

[11]  Catherine Pelachaud,et al.  Greta: A Simple Facial Animation Engine , 2002 .

[12]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[13]  Ioannis Pitas,et al.  Multimodal decision-level fusion for person authentication , 1999, IEEE Trans. Syst. Man Cybern. Part A.

[14]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  H. Ney,et al.  VTLN-based cross-language voice conversion , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[16]  Azriel Rosenfeld,et al.  Face recognition: A literature survey , 2003, CSUR.

[17]  Mikkel B. Stegmann,et al.  Active appearance models: Theory and cases , 2000 .

[18]  Rama Chellappa,et al.  Human and machine recognition of faces: a survey , 1995, Proc. IEEE.

[19]  Hui Ye,et al.  Perceptually weighted linear transformations for voice conversion , 2003, INTERSPEECH.

[20]  Fabio Lavagetto,et al.  MPEG-4: Audio/video and synthetic graphics/audio for mixed media , 1997, Signal Process. Image Commun..

[21]  Marian Stewart Bartlett,et al.  Face recognition by independent component analysis , 2002, IEEE Trans. Neural Networks.

[22]  Jun Zhang,et al.  Pace recognition: eigenface, elastic matching, and neural nets , 1997, Proc. IEEE.

[23]  Ioannis Pitas,et al.  Recent advances in biometric person authentication , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Stefan Fischer,et al.  Face authentication with Gabor information on deformable graphs , 1999, IEEE Trans. Image Process..

[25]  Levent M. Arslan,et al.  Speaker Transformation Algorithm using Segmental Codebooks (STASC) , 1999, Speech Commun..

[26]  Alexander Kain,et al.  Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[27]  Monson H. Hayes,et al.  Hidden Markov models for face recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[28]  Athanasios Mouchtaris,et al.  Non-parallel training for voice conversion by maximum likelihood constrained adaptation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[29]  Nadia Magnenat-Thalmann,et al.  Head Modeling from Pictures and Morphing in 3D with Image Metamorphosis Based on Triangulation , 1998, CAPTECH.

[30]  Alexander Kain,et al.  High-resolution voice transformation , 2001 .

[31]  Narendra Ahuja,et al.  Detecting Faces in Images: A Survey , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Kuldip K. Paliwal,et al.  Fast feature extraction method for robust face verification , 2002 .

[33]  Erik Hjelmås,et al.  Face Detection: A Survey , 2001, Comput. Vis. Image Underst..

[34]  Hui Ye,et al.  Voice conversion for unknown speakers , 2004, INTERSPEECH.

[35]  Saeed Vaseghi,et al.  Evaluation of methods for parameteric formant transformation in voice conversion , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[36]  Thomas Fromherz,et al.  A Survey of Face Recognition , 1997 .

[37]  Satoshi Nakamura,et al.  Voice conversion through vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[38]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[39]  R.S. Feris,et al.  Efficient real-time face tracking in wavelet subspace , 2001, Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems.

[40]  Alex Pentland,et al.  Probabilistic Visual Learning for Object Representation , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Fadi Dornaika,et al.  Online appearance-based face and facial feature tracking , 2004, ICPR 2004.

[42]  Li Wu,et al.  A Survey of Face Recognition , 2006 .

[43]  Jörgen Ahlberg Using the active appearance algorithm for face and facial feature tracking , 2001, Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems.

[44]  Juyang Weng,et al.  Hierarchical Discriminant Analysis for Image Retrieval , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[46]  Chafic Mokbel,et al.  Online adaptation of HMMs to real-life conditions: a unified framework , 2001, IEEE Trans. Speech Audio Process..

[47]  Uday B. Desai,et al.  Finding faces in photographs , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[48]  Shaogang Gong,et al.  Modelling facial colour and identity with Gaussian mixtures , 1998, Pattern Recognit..

[49]  Jörgen Ahlberg,et al.  An Active Model for Facial Feature Tracking , 2002, EURASIP J. Adv. Signal Process..

[50]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[51]  Fadi Dornaika,et al.  Fast and reliable active appearance model search for 3-D face tracking , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[52]  Josef Kittler,et al.  Combining classifiers: A theoretical framework , 1998, Pattern Analysis and Applications.

[53]  E. Mayoraz,et al.  Fusion of face and speech data for person identity verification , 1999, IEEE Trans. Neural Networks.

[54]  Fred L. Bookstein,et al.  Principal Warps: Thin-Plate Splines and the Decomposition of Deformations , 1989, IEEE Trans. Pattern Anal. Mach. Intell..