Procedure for audio-assisted browsing of news video using generalized sound recognition

In Casey describes a generalized sound recognition framework based on reduced rank spectra and Minimum-Entropy Priors. This approach enables successful recognition of a wide variety of sounds such as male speech, female speech, music, animal sounds etc. In this work, we apply this recognition framework to news video to enable quick video browsing. We identify speaker change positions in the broadcast news using the sound recognition framework. We combine the speaker change position with color & motion cues from video and are able to locate the beginning of each of the topics covered by the news video. We can thus skim the video by merely playing a small portion starting from each of the locations where one of the principal cast begins to speak. In combination with our motion-based video browsing approach, our technique provides simple automatic news video browsing. While similar work has been done before, our approach is simpler and faster than competing techniques, and provides a rich framework for further analysis and description of content.

[1]  Alan Hanjalic,et al.  DANCERS: Delft advanced news retrieval system , 2001, IS&T/SPIE Electronic Imaging.

[2]  M. Casey,et al.  MPEG-7 sound-recognition tools , 2001, IEEE Trans. Circuits Syst. Video Technol..

[3]  A. Divakaran,et al.  Content-based browsing system for personal video recorders , 2002, 2002 Digest of Technical Papers. International Conference on Consumer Electronics (IEEE Cat. No.02CH37300).

[4]  Zhu Liu,et al.  Multimedia content analysis-using both audio and visual clues , 2000, IEEE Signal Process. Mag..