Content-based access to video objects: Temporal Segmentation, visual summarization, and feature extraction

Abstract The classical approach to content-based video access has been ‘frame-based’, consisting of shot boundary detection, followed by selection of key frames that characterize the visual content of each shot, and then clustering of the camera shots to form story units. However, in an object-based multimedia environment, content-based random access to individual video objects becomes a desirable feature. To this effect, this paper introduces an ‘object-based’ approach to temporal video partitioning and content-based indexing, where the basic indexing unit is ‘lifespan of a video object’, rather than a ‘camera shot’ or a ‘story unit’. We propose to represent each video object by an adaptive 2D triangular mesh. A mesh-based object tracking scheme is then employed to compute the motion trajectories of all mesh node points until the object exits the field of view. A new similarity measure that is based on motion discontinuities and shape changes of the tracked object is defined to detect content changes, resulting in temporal lifespan segments. A set of ‘key snapshots’ which constitute a visual summary of the lifespan of the object is automatically selected. These key snapshots are then used to animate objects of interest using tracked motion trajectories for a moving visual representation. The proposed scheme provides such functionalities as object-based search/browsing for interactive video retrieval, surveillance video analysis, and object-based content manipulation/editing for studio postprocessing and desktop multimedia authoring. The approach is applicable to any video data where the initial appearance of object(s) can be specified, and the object motion can be modeled by a piecewise affine transformation. The system is demonstrated using different types of video: virtual studio productions (composited video), surveillance video, and TV broadcast video.

[1]  Ramesh C. Jain,et al.  Knowledge-guided parsing in video databases , 1993, Electronic Imaging.

[2]  A. Murat Tekalp,et al.  Temporal video segmentation using unsupervised clustering and semantic object tracking , 1998, J. Electronic Imaging.

[3]  Boon-Lock Yeo,et al.  Video content characterization and compaction for digital library applications , 1997, Electronic Imaging.

[4]  Boon-Lock Yeo,et al.  Rapid scene analysis on compressed video , 1995, IEEE Trans. Circuits Syst. Video Technol..

[5]  Yoshinobu Tonomura,et al.  Video tomography: an efficient method for camerawork extraction and motion analysis , 1994, MULTIMEDIA '94.

[6]  Hiroshi Harashima,et al.  Motion compensation based on spatial transformations , 1994, IEEE Trans. Circuits Syst. Video Technol..

[7]  A. Murat Tekalp,et al.  Tracking Motion and Intensity Variations Using Hierarchical 2-D Mesh Modeling for Synthetic Object Transfiguration , 1996, CVGIP Graph. Model. Image Process..

[8]  Jonathan Richard Shewchuk,et al.  Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator , 1996, WACG.

[9]  Minerva M. Yeung,et al.  Efficient matching and clustering of video shots , 1995, Proceedings., International Conference on Image Processing.

[10]  Thomas Sikora,et al.  The MPEG-4 video standard verification model , 1997, IEEE Trans. Circuits Syst. Video Technol..

[11]  Anthony Vetro,et al.  Use of two-dimensional deformable mesh structures for video coding. II. The analysis problem and a region-based coder employing an active mesh representation , 1996, IEEE Trans. Circuits Syst. Video Technol..

[12]  A. Murat Tekalp,et al.  Occlusion-adaptive, content-based mesh design and forward tracking , 1997, IEEE Trans. Image Process..

[13]  A. Murat Tekalp,et al.  Closed-form connectivity-preserving solutions for motion compensation using 2-D meshes , 1997, IEEE Trans. Image Process..

[14]  Yao Wang,et al.  Active mesh-a feature seeking and tracking image sequence representation scheme , 1994, IEEE Trans. Image Process..

[15]  Karin Wall,et al.  A fast sequential method for polygonal approximation of digitized curves , 1984, Comput. Vis. Graph. Image Process..

[16]  Yao Wang,et al.  Use of two-dimensional deformable mesh structures for video coding .I. The synthesis problem: mesh-based function approximation and mapping , 1996, IEEE Trans. Circuits Syst. Video Technol..

[17]  Stephen W. Smoliar,et al.  Content-based video browsing tools , 1995, Electronic Imaging.

[18]  Jörn Ostermann,et al.  Object-oriented analysis-synthesis coding of moving images , 1989, Signal Process. Image Commun..

[19]  Edward H. Adelson,et al.  Representing moving images with layers , 1994, IEEE Trans. Image Process..

[20]  Thomas D. C. Little,et al.  A Survey of Technologies for Parsing and Indexing Digital Video1 , 1996, J. Vis. Commun. Image Represent..

[21]  A. Murat Tekalp,et al.  Object-based video indexing for virtual-studio productions , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Shmuel Peleg,et al.  A Three-Frame Algorithm for Estimating Two-Component Image Motion , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Jc Shepherdson,et al.  Machine Intelligence 15 , 1998 .

[24]  D. Legall,et al.  MPEG : A video compression standard for multimedia applications , 1991 .

[25]  Leonardo Chiariglione MPEG and multimedia communications , 1997, IEEE Trans. Circuits Syst. Video Technol..

[26]  A. Murat Tekalp,et al.  Video indexing through integration of syntactic and semantic features , 1996, Proceedings Third IEEE Workshop on Applications of Computer Vision. WACV'96.

[27]  A. Murat Tekalp,et al.  Object-based video coding using forward-tracking 2D mesh layers , 1997, Electronic Imaging.

[28]  Demetri Terzopoulos,et al.  Sampling and reconstruction with adaptive meshes , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[29]  Jonathan D. Courtney Automatic video indexing via object motion analysis , 1997, Pattern Recognit..

[30]  Hideo Hashimoto,et al.  Video indexing using motion vectors , 1992, Other Conferences.

[31]  Daniel P. Huttenlocher,et al.  Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..