Highlight detection for video content analysis through double filters

Highlight detection is a form of video summarization techniques aiming at including the most expressive or attracting parts in the video. Most video highlights selection research work has been performed on sports video, detecting certain objects or events such as goals in soccer video, touch down in football and others. In this paper, we present a highlight detection method for film video. Highlight section in a film video is not like that in sports video that usually has certain objects or events. The methods to determine a highlight part in a film video can exhibit as three aspects: (a) locating obvious audio event, (b) detecting expressive visual content around the obvious audio location, (c) selecting the preferred portion of the extracted audio-visual highlight segments. We define a double filters model to detect the potential highlights in video. First obvious audio location is determined through filtering the obvious audio features, and then we perform the potential visual salience detection around the potential audio highlight location. Finally the production from the audio-visual double filters is compared with a preference threshold to determine the final highlights. The user study results indicate that the double filters detection approach is an effective method for highlight detection for video content analysis.

[1]  J. P. Campbell Speaker recognition : A tutorial : Automated biometrics , 1997 .

[2]  Serhan Dagtas,et al.  SmartWatch: an automated video event finder , 2000, MM 2000.

[3]  Lie Lu,et al.  Content analysis for audio classification and segmentation , 2002, IEEE Trans. Speech Audio Process..

[4]  Shih-Fu Chang,et al.  Real-time personalized sports video filtering and summarization , 2001, MULTIMEDIA '01.

[5]  Stan Z. Li,et al.  Content-based Classification and Retrieval of Audio Using the Nearest Feature Line Method , 2000 .

[6]  Luis Mateus Rocha,et al.  Singular value decomposition and principal component analysis , 2003 .

[7]  Wolfgang Effelsberg,et al.  Robust clustering-based video-summarization with integration of domain-knowledge , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[8]  Milan Petkovic,et al.  Multi-modal extraction of highlights from TV Formula 1 programs , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[9]  Shih-Fu Chang,et al.  A utility framework for the automatic generation of audio-visual skims , 2002, MULTIMEDIA '02.

[10]  Thomas Sikora,et al.  Audio classification based on MPEG-7 spectral basis representations , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  F. Dirfaux Key frame selection to represent a video , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).