provided that the source of such material is fully acknowledged. A Panoramic Video and Acoustic Beamforming Sensor for

Videoconferencing systems in use today typically rely on either fixed or pan/tilt/zoom cameras for image acquisition, and close-talking microphones for good quality audio capture. These sensors are unsuitable for scenarios involving multiple users seated at a meeting table, or non-stationary users. In these situations, the focus of attention should change from one talker to the next, and if possible track moving users. This work describes a multi-modal perception system using both video and audio signals for such a videoconferencing system. An omnidirectional video camera and an audio beamforming array are combined into a device placed in the center of a meeting table. The video and audio is processed to determine the direction of who is talking, a virtual perspective view and directional audio beam is then created. Computer vision algorithms are used to find people by motion and by face and marker detection. The audio beamformer merges the signals from a circular array of microphones to provide audio power measurements in different directions simultaneously. The video and audio cues are combined to make a decision as to the location of the talker. The system has been integrated with OpenH.323 and serves as a node using Microsoft NetMeeting.

[1]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[2]  Witold Pedrycz,et al.  Face recognition: A study in information fusion using fuzzy integral , 2005, Pattern Recognit. Lett..

[3]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[4]  Dmitry O. Gorodnichy,et al.  Video-based framework for face recognition in video , 2005, The 2nd Canadian Conference on Computer and Robot Vision (CRV'05).

[5]  Dmitry O. Gorodnichy,et al.  Using Associative Memory Principles to Enhance Perceptual Ability of Vision Systems , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[6]  Michael R. M. Jenkin,et al.  Audiovisual localization of multiple speakers in a video teleconferencing setting , 2003, Int. J. Imaging Syst. Technol..

[7]  B.D. Van Veen,et al.  Beamforming: a versatile approach to spatial filtering , 1988, IEEE ASSP Magazine.

[8]  Anoop Gupta,et al.  Distributed meetings: a meeting capture and broadcasting system , 2002, MULTIMEDIA '02.

[9]  Hirokazu Kato,et al.  Marker tracking and HMD calibration for a video-based augmented reality conferencing system , 1999, Proceedings 2nd IEEE and ACM International Workshop on Augmented Reality (IWAR'99).