Automatic person recognition by acoustic and geometric features

This paper describes a multisensorial person-identification system in which visual and acoustic cues are used jointly for person identification. A simple approach, based on the fusion of the lists of scores produced independently by a speaker-recognition system and a face-recognition system, is presented. Experiments are reported that show that the integration of visual and acoustic information enhances both the performance and the reliability of the separate systems. Finally, two network architectures, based on radial basis-function theory, are proposed to describe integration at various levels of abstraction.

[1]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[2]  J. Makhoul,et al.  Vector quantization in speech coding , 1985, Proceedings of the IEEE.

[3]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[4]  Tomaso A. Poggio,et al.  Extensions of a Theory of Networks for Approximation and Learning , 1990, NIPS.

[5]  B.S. Atal,et al.  Automatic recognition of speakers from their voices , 1976, Proceedings of the IEEE.

[6]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[7]  Roberto Brunelli,et al.  Face Recognition: Features Versus Templates , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Tomaso Poggio,et al.  A project for an intelligent system: Vision and learning , 1992 .

[9]  G.R. Doddington,et al.  Speaker recognition—Identifying people by their voices , 1985, Proceedings of the IEEE.

[10]  Osamu Nakamura,et al.  Identification of human faces based on isodensity maps , 1991, Pattern Recognit..

[11]  M. K. Fleming,et al.  Categorization of faces using unsupervised feature extraction , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[12]  Ian Craw,et al.  Automatic extraction of face-features , 1987, Pattern Recognit. Lett..

[13]  Roberto Brunelli,et al.  Face Recognition through Geometrical Features , 1992, ECCV.

[14]  Aaron E. Rosenberg,et al.  Evaluation of a vector quantization talker recognition system in text independent and text dependent modes , 1987 .

[15]  T. Poggio,et al.  A network that learns to recognize three-dimensional objects , 1990, Nature.

[16]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[17]  LUIGI STRINGA,et al.  Eyes detection for face recognition , 1993, Appl. Artif. Intell..

[18]  Robert J. Baron,et al.  Mechanisms of Human Facial Recognition , 1981, Int. J. Man Mach. Stud..

[19]  Naftali Z. Tisby On the application of mixture AR hidden Markov models to text independent speaker recognition , 1991, IEEE Trans. Signal Process..