Exploring video structure beyond the shots

While existing shot-based video analysis approaches provide users with better access to the video than the raw data stream does, they are still not sufficient for meaningful video browsing and retrieval, since: (1) the shots in a long video are still too many to be presented to the user; and (2) shots do not capture the underlying semantic structure of the video, based on which the user may wish to browse/retrieve the video. To explore video structure at the semantic level this paper presents an effective approach for video scene structure construction, in which shots are grouped into semantic-related scenes. The output of the proposed algorithm provides a structured video that greatly facilitates user's access. Experiments based on real-world movie videos validate the effectiveness of the proposed approach.

[1]  Osamu Hori,et al.  A shot classification method of selecting effective key-frames for video browsing , 1997, MULTIMEDIA '96.

[2]  Boon-Lock Yeo,et al.  Extracting story units from long programs for video browsing and navigation , 1996, Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems.

[3]  Wayne H. Wolf,et al.  Key frame selection by motion analysis , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.