Locating and Tracking of Human Faces with Neural Networks

E ective Human{to{Human communication involves both auditory and visual modalities, providing robustness and naturalness in realistic communication situations. Recent e orts at our lab are aimed at providing such multimodal capabilities for human-machine communication as well by introducing gesture, character and speech recognition, eye-tracking and lipreading. Most of the visual modalities require a stable image of a speaker's face. In this technical report a connectionist face tracker is proposed that manipulates camera orientation and zoom, to keep a person's face located at all times in an image sequence. The system operates in real time and can adapt rapidly to di erent lighting conditions, di erent cameras and faces, making it robust against environmental variability. Extensions and integration of the system with a multimodal interface will be presented.

[1]  T. Sakai,et al.  Computer analysis and classification of photographs of human faces , 1973 .

[2]  M. Stone Cross-validation:a review 2 , 1978 .

[3]  S. Silbernagl,et al.  Color atlas of physiology , 1981 .

[4]  J. Flanagan,et al.  Computer‐steered microphone arrays for sound transduction in large rooms , 1985 .

[5]  R. A. Hutchinson,et al.  Comparison of neural networks and conventional techniques for feature location in facial images , 1989 .

[6]  Venu Govindaraju,et al.  Locating human faces in newspaper photographs , 1989, Proceedings CVPR '89: IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  M. K. Fleming,et al.  Categorization of faces using unsupervised feature extraction , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[8]  Terrence J. Sejnowski,et al.  SEXNET: A Neural Network Identifies Sex From Human Faces , 1990, NIPS.

[9]  Garrison W. Cottrell,et al.  Extracting features from faces using compression networks: Face , 1990 .

[10]  Anders Krogh,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[11]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[12]  Shumeet Baluja,et al.  Non-Intrusive Gaze Tracking Using Artificial Neural Networks , 1993, NIPS.

[13]  Alexander H. Waibel,et al.  Improving connected letter recognition by lipreading , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Dean A. Pomerleau,et al.  Neural Network Perception for Mobile Robot Guidance , 1993 .

[15]  Alexander H. Waibel,et al.  See Me, Hear Me: Integrating Automatic Speech Recognition and Lip-reading , 1994 .