Key-frame extraction for object-based video segmentation

We propose a coherent approach to extract key-frames within a video shot for object-based video segmentation. A unified feature space is first constructed to represent video frames and visual objects simultaneously in a joint spatio-temporal domain, and key-frame extraction is formulated as a feature selection process that aims to maximize the cluster divergence of video objects by selecting an optimal set of key-frames. Specifically, two different criteria are used to achieve joint key-frame extraction and object segmentation. One criterion recommends key-frame extraction that leads to the maximum pairwise interclass divergence between objects in the feature space. The other aims at maximizing the marginal divergence of objects in each frame. Simulations with both synthetic and real video data manifest the efficiency and robustness of the proposed methods.

[1]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[2]  H. P. Decell,et al.  An Iterative Approach to the Feature Selection Problem , 1973 .

[3]  Nuno Vasconcelos Feature selection by maximum marginal diversity: optimality and implications for visual recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[4]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[5]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[6]  Hayit Greenspan,et al.  Probabilistic space-time video modeling via piecewise GMM , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[8]  Solomon Kullback,et al.  Information Theory and Statistics , 1960 .

[9]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[10]  Guoliang Fan,et al.  Combined key-frame extraction and object-based video segmentation , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Solomon Kullback,et al.  Information Theory and Statistics , 1970, The Mathematical Gazette.

[12]  Hayit Greenspan,et al.  A Probabilistic Framework for Spatio-Temporal Video Representation & Indexing , 2002, ECCV.