Determining a Structured Spatio-Temporal Representation of Video Content for Efficient Visualization and Indexing

Efficient access to information contained in video databases implies that a structured representation of the content of the video is built beforehand. This paper describes an approach in this direction, targeted at video indexing and browsing. Exploiting a 2D motion model estimator, we partition the video into shots, characterize camera motion, extract and track mobile objects. These steps rely on robust motion estimation, statistical tests and contextual statistical labeling. The content of each shot can then be viewed on a synoptic frame composed of a mosaic image of the background scene, on which trajectories of mobile objects are superimposed. The proposed method also provides instantaneous and long-term, qualitative and quantitative object motion cues for content-based indexing. Its different steps and the system they form are designed to keep computational cost low, while being able to cope with general video content was aimed at. We provide experimental results on real-world sequences. The structured output opens important possible extensions, for instance in the direction of higher-level interpretation.

[1]  Nilesh V. Patel,et al.  Video shot detection and characterization for video databases , 1997, Pattern Recognit..

[2]  Christoph Stiller,et al.  Object-based estimation of dense motion fields , 1997, IEEE Trans. Image Process..

[3]  Michael A. Smith,et al.  Video skimming and characterization through the combination of image and language understanding techniques , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Cordelia Schmid,et al.  Combining greyvalue invariants with local constraints for object recognition , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Patrick Bouthemy,et al.  Derivation of qualitative information in motion analysis , 1990, Image Vis. Comput..

[6]  BassevilleMichèle Detecting changes in signals and systemsa survey , 1988 .

[7]  P. Anandan,et al.  Efficient representations of video sequences and their applications , 1996, Signal Process. Image Commun..

[8]  Sethuraman Panchanathan,et al.  Review of Image and Video Indexing Techniques , 1997, J. Vis. Commun. Image Represent..

[9]  Patrick Bouthemy,et al.  The Derivation Of Qualitative Information In Motion Analysis , 1990, ECCV.

[10]  Harpreet S. Sawhney,et al.  Compact Representations of Videos Through Dominant and Multiple Motion Estimation , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Harpreet S. Sawhney,et al.  True multi-image alignment and its application to mosaicing and lens distortion correction , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Carl Machover International technology transfer (Panel) , 1984, SIGGRAPH.

[13]  Luigi Cinque,et al.  Indexing pictorial documents by their content: a survey of current techniques , 1997, Image Vis. Comput..

[14]  Jean-Marc Odobez,et al.  Robust Multiresolution Estimation of Parametric Motion Models , 1995, J. Vis. Commun. Image Represent..

[15]  Jonathan D. Courtney Automatic video indexing via object motion analysis , 1997, Pattern Recognit..

[16]  Michèle Basseville,et al.  Detecting changes in signals and systems - A survey , 1988, Autom..

[17]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[18]  Shmuel Peleg,et al.  Universal mosaicing using pipe projection , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[19]  Edward H. Adelson,et al.  Representing moving images with layers , 1994, IEEE Trans. Image Process..

[20]  Stephen W. Smoliar,et al.  An integrated system for content-based video retrieval and browsing , 1997, Pattern Recognit..

[21]  Dragutin Petkovic,et al.  Content-Based Representation and Retrieval of Visual Media: A State-of-the-Art Review , 1996 .

[22]  John S. Boreczky,et al.  Comparison of video shot boundary detection techniques , 1996, J. Electronic Imaging.

[23]  Philippe Aigrain,et al.  The automatic real-time analysis of film editing and transition effects and its applications , 1994, Comput. Graph..

[24]  Patrick Bouthemy,et al.  A region-level graph labeling approach to motion-based segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Alberto Del Bimbo,et al.  Symbolic Description and Visual Querying of Image Sequences Using Spatio-Temporal Logic , 1995, IEEE Trans. Knowl. Data Eng..

[26]  Patrick Bouthemy,et al.  Video partitioning and camera motion characterization for content-based video indexing , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[27]  J. Bergen,et al.  E � cient Representations of Video Sequences and Their Applications , 1996 .

[28]  ZhangHongJiang,et al.  Automatic partitioning of full-motion video , 1993 .

[29]  Michal Irani,et al.  Detecting and Tracking Multiple Moving Objects Using Temporal Integration , 1992, ECCV.