LISTEN: a system for locating and tracking individual speakers

Both visual and acoustical informations provide effective means of telecommunication between persons. In this context, the face is the most important part of the person both visually and acoustically. We describe how the cooperation of image and audio processing allows to track a person's face and to collect the audio information it produces. We present detection techniques of regions of interest (e.g. Moving regions of skin color), coupled with a neural network based face detector with a low false alarm rate, to locate and track faces. The system is connected to a nine microphone array adaptive beam forming which performs immediate beam forming. Visual and acoustical informations from the speaker face are thus obtained in real time.

[1]  Feraud Raphaël,et al.  A Constrained Generative Model Applied to Face Detection , 1997 .

[2]  Alexander H. Waibel,et al.  Knowing who to listen to in speech recognition: visually guided beamforming , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Takeo Kanade,et al.  Human Face Detection in Visual Scenes , 1995, NIPS.

[4]  R. Bajcsy Active perception , 1988 .

[5]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[6]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[7]  Rama Chellappa,et al.  Human and machine recognition of faces: a survey , 1995, Proc. IEEE.

[8]  R A Brooks,et al.  New Approaches to Robotics , 1991, Science.

[9]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[10]  Stuart J. Russell Rationality and Intelligence , 1995, IJCAI.

[11]  Alex Waibel,et al.  Face locating and tracking for human-computer interaction , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[12]  Yiannis Aloimonos,et al.  Purposive and qualitative active vision , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[13]  André Gilloire,et al.  Microphone array for sound pickup in teleconference systems , 1994 .

[14]  Yannick Mahieux,et al.  A Microphone Array for Multimedia Workstations , 1996 .

[15]  Alice J. O'Toole,et al.  Connectionist models of face processing: A survey , 1994, Pattern Recognit..