Development of automatic video editing system based on stereo-based head tracking for multiparty conversations

This paper presents an automatic video editing system based on head tracking for multiparty conversations. Archiving meetings is attracting considerable interest. Conventional systems use a fixed-viewpoint camera and simple camera selection based on participants' utterances. However, conventional systems fail to adequately convey to the viewers who is talking to whom. We focus on the participants' head orientation since this information is useful in detecting the speaker and who the speaker is talking to. In order to automatically estimate each participant's head orientation, our system combines modules for stereo-based head tracking. The system selects the shot of the participant that most participants are looking at, based on majority decision. Experiments confirm the effectiveness of our system in several 3-participant conversations.

[1]  Trevor Darrell,et al.  Adaptive view-based appearance models , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..