Video browsing using clustering and scene transitions on compressed sequences

This paper describes a new technique for extracting a hierarchical decomposition of a complex video selection for browsing purposes. The technique combines visual and temporal information to capture the important relations within a scene and between scenes in a video, thus allowing the analysis of the underlying story structure with no a priori knowledge of the content. We define a general model of hierarchical scene transition graph, and apply this model in an implementation for browsing. Video shots are first identified and a collection of key frames is used to represent each video segment. These collections are then classified according to gross visual information. A platform is built on which the video is presented as directed graphs to the user, with each category of video shots represented by a node and each edge denotes a temporal relationship between categories. The analysis and processing of video is carried out directly on the compressed videos. Preliminary tests show that the narrative structure of a video selection can be effectively captured using this technique.