Joint Key-Frame Extraction and Object-Based Video Segmentation

In this paper, we propose a coherent framework for joint key-frame extraction and object-based video segmentation. Conventional key-frame extraction and object segmentation are usually implemented independently and separately due to the fact that they are on different semantic levels. This ignores the inherent relationship between key-frames and objects. The proposed method extracts a small number of key-frames within a shot so that the divergence between video objects in a feature space can be maximized, supporting robust and efficient object segmentation. This method can utilize advantages of both temporal and object-based video segmentations, and be helpful to build a unified framework for content-based analysis and structured video representation. Theoretical analysis and simulation results on both synthetic and real video sequences manifest the efficiency and robustness of the proposed method.

[1]  M Kubovy,et al.  The emergence of visual objects in space-time. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Hayit Greenspan,et al.  A Probabilistic Framework for Spatio-Temporal Video Representation & Indexing , 2002, ECCV.

[3]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  S. Kullback,et al.  Information Theory and Statistics , 1959 .

[5]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[6]  A. Murat Tekalp,et al.  Effective content representation for video , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[7]  Daniel DeMenthon,et al.  A Survey of Spatio-Temporal Grouping Techniques , 2002 .

[8]  Jitendra Malik,et al.  Efficient spatiotemporal grouping using the Nystrom method , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[9]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[10]  Jitendra Malik,et al.  Motion segmentation and tracking using normalized cuts , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[11]  Daniel DeMenthon,et al.  SPATIO-TEMPORAL SEGMENTATION OF VIDEO BY HIERARCHICAL MEAN SHIFT ANALYSIS , 2002 .

[12]  H. P. Decell,et al.  An Iterative Approach to the Feature Selection Problem , 1973 .

[13]  Alan Hanjalic,et al.  An integrated scheme for automated video abstraction based on unsupervised cluster-validity analysis , 1999, IEEE Trans. Circuits Syst. Video Technol..

[14]  Josef Kittler,et al.  Divergence Based Feature Selection for Multimodal Class Densities , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Glorianna Davenport,et al.  Cinematic primitives for multimedia , 1991, IEEE Computer Graphics and Applications.

[16]  Guoliang Fan,et al.  Combined key-frame extraction and object-based video segmentation , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Solomon Kullback,et al.  Information Theory and Statistics , 1970, The Mathematical Gazette.

[18]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[19]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[20]  Hayit Greenspan,et al.  Probabilistic space-time video modeling via piecewise GMM , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Roger Mohr,et al.  A probabilistic framework of selecting effective key frames for video browsing and indexing , 2000 .