Automatic Moving Object Extraction toward Content-Based Video Representation and Indexing

In this paper, we present a novel semantic video object generation and temporal tracking technique for providing content-based video representation and indexing. In our system, the homogeneous image regions with accurate boundaries are first obtained by integrating the results of color edge detection and similarity-based region growing procedures. Then the semantic objects of meaning to human users are generated by an automatic seeded region aggregation or human interaction procedure. These generated semantic objects are then tracked along the time axis for exploiting their temporal correspondences among frames. The video sources in databases can be accessed through these individual video objects obtained and they are represented by a set of visual features, such as its trajectory, set of boundary pixels, area of the object region, shape, color histogram, orientation, and regularity. A learning-based optimization procedure is used for selecting the suitable dimensional weighting coefficients, so that the feature-based object similarity corresponds to concept-based similarity directly. Moreover, a seeded semantic video content clustering technique for providing cluster-based hierarchical video indexing, retrieval, and browsing is proposed.

[1]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[2]  Boon-Lock Yeo,et al.  Classification, simplification, and dynamic visualization of scene transition graphs for video browsing , 1997, Electronic Imaging.

[3]  C.-C. Jay Kuo,et al.  Texture analysis and classification with tree-structured wavelet transform , 1993, IEEE Trans. Image Process..

[4]  Alberto Del Bimbo,et al.  Symbolic Description and Visual Querying of Image Sequences Using Spatio-Temporal Logic , 1995, IEEE Trans. Knowl. Data Eng..

[5]  Charles A. Bouman,et al.  A compressed video database structured for active browsing and search , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[6]  King Ngi Ngan,et al.  Face segmentation using skin-color map in videophone applications , 1999, IEEE Trans. Circuits Syst. Video Technol..

[7]  A. Ardeshir Goshtasby,et al.  Detecting human faces in color images , 1998, Image Vis. Comput..

[8]  Shih-Fu Chang,et al.  Model-based classification of visual information for content-based retrieval , 1998, Electronic Imaging.

[9]  JongWon Kim,et al.  SIVOG: smart interactive video object generation system , 1999, MULTIMEDIA '99.

[10]  John F. Haddon,et al.  Image Segmentation by Unifying Region and Boundary Information , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Boon-Lock Yeo,et al.  Extracting story units from long programs for video browsing and navigation , 1996, Proceedings of the Third IEEE International Conference on Multimedia Computing and Systems.

[12]  Levent Onural,et al.  Image sequence analysis for emerging interactive multimedia services-the European COST 211 framework , 1998, IEEE Trans. Circuits Syst. Video Technol..

[13]  Wei Xiong,et al.  Query by video clip , 1999, Multimedia Systems.

[14]  Gilad Adiv,et al.  Determining Three-Dimensional Motion and Structure from Optical Flow Generated by Several Moving Objects , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Aidong Zhang,et al.  SemQuery: Semantic Clustering and Querying on Heterogeneous Features for Visual Data , 2002, IEEE Trans. Knowl. Data Eng..

[16]  Ingemar J. Cox,et al.  The Bayesian image retrieval system, PicHunter: theory, implementation, and psychophysical experiments , 2000, IEEE Trans. Image Process..

[17]  Huitao Luo,et al.  Designing an interactive tool for video object segmentation and annotation , 1999, MULTIMEDIA '99.

[18]  Shih-Fu Chang,et al.  Clustering methods for video browsing and annotation , 1996, Electronic Imaging.

[19]  A. Murat Tekalp,et al.  Temporal video segmentation using unsupervised clustering and semantic object tracking , 1998, J. Electronic Imaging.

[20]  Norbert Diehl,et al.  Object-oriented motion estimation and segmentation in image sequences , 1991, Signal Process. Image Commun..

[21]  Fuxi Gan,et al.  Motion estimation based on global and local uncompensability analysis , 1997, IEEE Trans. Image Process..

[22]  Tom Minka,et al.  Interactive learning with a "Society of Models" , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[24]  Anup Basu,et al.  Motion Tracking with an Active Camera , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Thomas S. Huang,et al.  A novel relevance feedback technique in image retrieval , 1999, MULTIMEDIA '99.

[26]  Gang Wei,et al.  Face detection for image annotation , 1999, Pattern Recognition Letters.

[27]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[28]  Michal Irani,et al.  Video indexing based on mosaic representations , 1998, Proc. IEEE.

[29]  Rolf Adams,et al.  Seeded Region Growing , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Wei Xiong,et al.  Query by video clip , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[31]  Charles A. Bouman,et al.  ViBE: a video indexing and browsing environment , 1999, Optics East.

[32]  Thomas S. Huang,et al.  Modified Fourier Descriptors for Shape Representation - A Practical Approach , 1996 .

[33]  Takeo Kanade,et al.  Name-It: association of face and name in video , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..

[35]  Luis Torres,et al.  Region-based video coding using mathematical morphology , 1995 .

[36]  John P. Oakley,et al.  Storage and Retrieval for Image and Video Databases , 1993 .

[37]  Jie Chen,et al.  Affine curve moment invariants for shape recognition , 1997, Pattern Recognit..

[38]  Jianping Fan,et al.  Automatic moving object extraction toward compact video representation , 2000 .

[39]  King Ngi Ngan,et al.  Automatic segmentation of moving objects for video object plane generation , 1998, IEEE Trans. Circuits Syst. Video Technol..

[40]  Clement T. Yu,et al.  Detecting human faces in color images , 1998, Proceedings International Workshop on Multi-Media Database Management Systems (Cat. No.98TB100249).

[41]  Edward Y. Chang,et al.  Clindex: Clustering for Similarity Queries in High-Dimensional Spaces. , 1999 .

[42]  Anil K. Jain,et al.  Bayesian framework for semantic classification of outdoor vacation images , 1998, Electronic Imaging.

[43]  Jianping Fan,et al.  Adaptive motion-compensated video coding scheme towards content-based bit rate allocation , 2000, J. Electronic Imaging.

[44]  Hideyuki Tamura,et al.  Textural Features Corresponding to Visual Perception , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[45]  Katsumi Tanaka,et al.  OVID: Design and Implementation of a Video-Object Database System , 1993, IEEE Trans. Knowl. Data Eng..

[46]  Ming-Chieh Lee,et al.  Semiautomatic segmentation and tracking of semantic video objects , 1998, IEEE Trans. Circuits Syst. Video Technol..

[47]  Aidong Zhang,et al.  Semantic clustering and querying on heterogeneous features for visual data , 1998, MULTIMEDIA '98.

[48]  Ying Xu,et al.  2D image segmentation using minimum spanning trees , 1997, Image Vis. Comput..

[49]  M. Hötter,et al.  Image segmentation based on object oriented mapping parameter estimation , 1988 .

[50]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[51]  Shih-Fu Chang,et al.  A fully automated content-based video search engine supporting spatiotemporal queries , 1998, IEEE Trans. Circuits Syst. Video Technol..

[52]  Shih-Fu Chang,et al.  VisualSEEk: a fully automated content-based image query system , 1997, MULTIMEDIA '96.

[53]  Daniel P. Huttenlocher,et al.  Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[54]  Amarnath Gupta,et al.  Virage video engine , 1997, Electronic Imaging.

[55]  Luc Vincent,et al.  Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[56]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[57]  Edward H. Adelson,et al.  Representing moving images with layers , 1994, IEEE Trans. Image Process..

[58]  Rong Wang,et al.  Image sequence segmentation based on 2D temporal entropic thresholding , 1996, Pattern Recognit. Lett..

[59]  B. S. Manjunath,et al.  NeTra-V: toward an object-based video representation , 1997, Electronic Imaging.

[60]  Jonathan D. Courtney Automatic video indexing via object motion analysis , 1997, Pattern Recognit..

[61]  Thomas S. Huang,et al.  Constructing table-of-content for videos , 1999, Multimedia Systems.