Novel technique for automatic key frame computing

In general, video shots need to be clustered to form more semantically significant units, such as scenes, sequences, programs, etc. This is the so-called story-based video structuring. Automatic video structuring is of great importance for video browsing and retrieval. The shots or scenes are usually described by one or several representative frames, called key frames. Viewed from a higher level, key frames of some shots might be redundant in terms of semantics. In this paper, we propose automatic solutions to the problems of key frame computing and key frame pruning. We develop an original image similarity criterion, which considers both spatial layout and detail content in an image. Coefficients of wavelet decomposition are used to derive parameter vectors accounting for the above two aspects. The parameters exhibit (quasi-) invariant properties. The novel `Seek and Spread' strategy used in key frame computing allows us to obtain a large representative range for the key frames. Inter-shot redundancy of the key frames is suppressed using the same image similarity measure. Experimental results demonstrate the effectiveness and efficiency of our techniques.