Video Summaries through Mosaic-Based Shot and Scene Clustering

We present an approach for compact video summaries that allows fast and direct access to video data. The video is segmented into shots and, in appropriate video genres, into scenes, using previously proposed methods. A new concept that supports the hierarchical representation of video is presented, and is based on physical setting and camera locations. We use mosaics to represent and cluster shots, and detect appropriate mosaics to represent scenes. In contrast to approaches to video indexing which are based on key-frames, our efficient mosaic-based scene representation allows fast clustering of scenes into physical settings, as well as further comparison of physical settings across videos. This enables us to detect plots of different episodes in situation comedies and serves as a basis for indexing whole video sequences. In sports videos where settings are not as well defined, our approachallo ws classifying shots for characteristic event detection. We use a novel method for mosaic comparison and create a highly compact non-temporal representation of video. This representation allows accurate comparison of scenes across different videos and serves as a basis for indexing video libraries.

[1]  D. Arijon,et al.  Grammar of Film Language , 1976 .

[2]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[3]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[4]  Minerva M. Yeung,et al.  Efficient matching and clustering of video shots , 1995, Proceedings., International Conference on Image Processing.

[5]  Shih-Fu Chang,et al.  Scene change detection in an MPEG-compressed video sequence , 1995, Electronic Imaging.

[6]  P. Anandan,et al.  Mosaic based representations of video sequences and their applications , 1995, Proceedings of IEEE International Conference on Computer Vision.

[7]  P. Anandan,et al.  Efficient representations of video sequences and their applications , 1996, Signal Process. Image Commun..

[8]  Boon-Lock Yeo,et al.  Time-constrained clustering for segmentation of video into story units , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[9]  Richard Szeliski,et al.  Creating full view panoramic image mosaics and environment maps , 1997, SIGGRAPH.

[10]  Rainer Lienhart,et al.  Comparison of automatic shot boundary detection algorithms , 1998, Electronic Imaging.

[11]  John R. Kender,et al.  Video scene segmentation via continuous video coherence , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[12]  Nuno Vasconcelos,et al.  A spatiotemporal motion model for video summarization , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[13]  Patrick Bouthemy,et al.  Determining a Structured Spatio-Temporal Representation of Video Content for Efficient Visualization and Indexing , 1998, ECCV.

[14]  Michal Irani,et al.  Video indexing based on mosaic representations , 1998, Proc. IEEE.

[15]  Alan Hanjalic,et al.  Automated high-level movie segmentation for advanced video-retrieval systems , 1999, IEEE Trans. Circuits Syst. Video Technol..

[16]  Shmuel Peleg,et al.  Rectified mosaicing: mosaics without the curl , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[17]  Surya Nepal,et al.  Automatic detection of 'Goal' segments in basketball videos , 2001, MULTIMEDIA '01.

[18]  John R. Kender,et al.  A unified memory based approach to cut, dissolve, key frame and scene analysis , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[19]  Andrew Zisserman,et al.  Viewpoint invariant texture matching and wide baseline stereo , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.