This demo presents a realtime system for analyzing group meetings. Targeting round-table meetings, this system employs an omnidirectional camera-microphone system. The goal of this system is to automatically discover "who is talking to whom and when". To that purpose, the face pose/position of meeting participants are tracked on panorama images acquired from fisheye-based omnidirectional cameras. From audio signals obtained with microphone array, speaker diarization, i.e. the estimation of "who is speaking and when", is carried out. The visual focus of attention, i.e. "who is looking at whom", is esimated from the result of face tracking. The results are displayed based on a 3D visualization scheme. The advantage of our system is its realtimeness. We will demonstrate the portable version of the system consisting of two laptop PCs. In addition, we will showcase our meeting playback viewer with man-machine interfaces that allow users to freely control space and time of meeting scenes. With this viewer, users can also experince 3D positional sound effect linked with 3D viewpoint, using enhanced audio tracks for each participant.
[1]
Junji Yamato,et al.
Memory-based Particle Filter for face pose tracking robust under complex dynamics
,
2009,
2009 IEEE Conference on Computer Vision and Pattern Recognition.
[2]
Masakiyo Fujimoto,et al.
A realtime multimodal system for analyzing group meetings by combining face pose tracking and speaker diarization
,
2008,
ICMI '08.
[3]
Hiroshi Sawada,et al.
Blind Speech Separation in a Meeting Situation with Maximum SNR Beamformers
,
2007,
2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.