Scene Duplicate Detection from News Videos Using Image-Audio Matching Focusing on Human Faces

As one tool for structuring a massive volume of archived news videos based on their semantic contents, this paper proposes a method to detect scene duplicates from news videos. A scene duplicate is a pair of video segments taken at the same event from different viewpoints. Referring to the audio channel is effective to detect scene duplicates regardless of viewpoints, but it cannot be relied on when external audio sources (e.g. Narrations, sound effects) overlap the original one. In contrast, the image channel can be useful in most cases, although significant difference in viewpoints affect the detection. The proposed method integrates the information from these two channels in order to improve the accuracy of scene duplicate detection from news videos. The performance of the proposed method was evaluated through an experiment with actual broadcast news videos. As a result, we obtained the higher detection accuracies in both recall and precision. Therefore, we confirmed the effectiveness of the proposed method.