Compact Representations of Videos Through Dominant and Multiple Motion Estimation

An explosion of on-line image and video data in digital form is already well underway. With the exponential rise in interactive information exploration and dissemination through the World-Wide Web (WWW), the major inhibitors of rapid access to on-line video data are costs and management of capture and storage, lack of real-time delivery, and nonavailability of content-based intelligent search and indexing techniques. The solutions for capture, storage, and delivery may be on the horizon or a little beyond. However, even with rapid delivery, the lack of efficient authoring and querying tools for visual content-based indexing may still inhibit as widespread a use of video information as that of text and traditional tabular data is currently. In order to be able to nonlinearly browse and index into videos through visual content, it is necessary to develop authoring tools that can automatically separate moving objects and significant components of the scene, and represent these in a compact form. Given that video data comes in torrents-almost a megabyte every 30th of a second-it will be highly inefficient to search for objects and scenes in every frame of a video. In this paper, we present techniques to automatically derive compact representations of scenes and objects from the motion information. Image motion is a significant cue in videos for the separation of scenes into their significant components and for the separation of moving objects. Motion analysis is useful in capturing the visual content of videos for indexing and browsing in two different ways. First, separation of the static scene from moving objects can be accomplished by employing dominant 2D/3D motion estimation methods. Alternatively, if the goal is to be able to represent the fixed scene too as a composition of significant structures and objects, then simultaneous multiple motion methods might be more appropriate. In either case, view-based summarized representations of the scene can be created by video compositing/mosaicing based on the estimated motions. We present robust algorithms for both kinds of representations: 1) dominant motion estimation based techniques which exploit a fairly common occurrence in videos that a mostly fixed background (scene) is imaged with or without independently moving objects, and 2) simultaneous multiple motion estimation and representation of motion video using layered representations. Ample examples of the representations achieved by each method are included in the paper.

[1]  Alex Pentland,et al.  Cooperative Robust Estimation Using Layers of Support , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  W. James MacLean,et al.  Recovery of Egomotion and Segmentation of Independent Object Motion Using the EM Algorithm , 1994, BMVC.

[3]  Josef Bigün,et al.  Segmentation of moving objects by robust motion parameter estimation over multiple frames , 1994, ECCV.

[4]  Richard Szeliski,et al.  Direct methods for visual scene reconstruction , 1995, Proceedings IEEE Workshop on Representation of Visual Scenes (In Conjunction with ICCV'95).

[5]  J. Ashley,et al.  Automatic and Semi-Automatic Methods for Image Annotation and Retrieval in QBIC , 1995 .

[6]  Richard I. Hartley,et al.  Euclidean Reconstruction from Uncalibrated Views , 1993, Applications of Invariance in Computer Vision.

[7]  Edward H. Adelson,et al.  Ordinal characteristics of transparency. , 1990 .

[8]  Walter Bender,et al.  Salient video stills: content and context preserved , 1993, MULTIMEDIA '93.

[9]  P. Anandan,et al.  Accurate computation of optical flow by using layered motion representations , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[10]  Alex Pentland,et al.  Photobook: tools for content-based manipulation of image databases , 1994, Other Conferences.

[11]  Azriel Rosenfeld,et al.  Computer Vision , 1988, Adv. Comput..

[12]  Edward H. Adelson,et al.  Layered representation for motion analysis , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Nassir Navab,et al.  Relative affine structure: theory and application to 3D reconstruction from perspective views , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[14]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[15]  Michael J. Black,et al.  The robust estimation of multiple motions: Affine and piecewise smooth flow fields , 1993 .

[16]  Michal Irani,et al.  Detecting and Tracking Multiple Moving Objects Using Temporal Integration , 1992, ECCV.

[17]  P. Anandan,et al.  Mosaic based representations of video sequences and their applications , 1995, Proceedings of IEEE International Conference on Computer Vision.

[18]  D. Ruppert Robust Statistics: The Approach Based on Influence Functions , 1987 .

[19]  Gilad Adiv,et al.  Determining Three-Dimensional Motion and Structure from Optical Flow Generated by Several Moving Objects , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Steve Mann,et al.  Virtual bellows: constructing high quality stills from video , 1994, Proceedings of 1st International Conference on Image Processing.

[21]  Richard Szeliski,et al.  Image mosaicing for tele-reality applications , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[22]  F. Mosteller,et al.  Exploring Data Tables, Trends and Shapes. , 1988 .

[23]  Christos Faloutsos,et al.  QBIC project: querying images by content, using color, texture, and shape , 1993, Electronic Imaging.

[24]  V. Hasselblad Estimation of parameters for a mixture of normal distributions , 1966 .

[25]  Howard Wainer,et al.  Robust Regression & Outlier Detection , 1988 .

[26]  J. Kittler,et al.  Robust motion analysis , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Harpreet S. Sawhney,et al.  Layered representation of motion video using robust maximum-likelihood estimation of mixture models and MDL encoding , 1995, Proceedings of IEEE International Conference on Computer Vision.

[28]  Michal Irani,et al.  Representation of scenes from collections of images , 1995, Proceedings IEEE Workshop on Representation of Visual Scenes (In Conjunction with ICCV'95).

[29]  Dragutin Petkovic,et al.  Automatic and semiautomatic methods for image annotation and retrieval in query by image content (QBIC) , 1995, Electronic Imaging.

[30]  Harpreet S. Sawhney,et al.  Model-based 2D&3D dominant motion estimation for mosaicing and video representation , 1995, Proceedings of IEEE International Conference on Computer Vision.

[31]  Alex Pentland,et al.  Photobook: tools for content-based manipulation of image databases , 1994, Electronic Imaging.

[32]  P. Anandan,et al.  Direct recovery of shape from multiple views: a parallax based approach , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[33]  T. Kato,et al.  Rough sketch-based image information retrieval , 1993 .

[34]  Serge Ayer,et al.  Sequential and competitive methods for estimation of multiple motions , 1995 .

[35]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[36]  Jean-Marc Odobez,et al.  Detection of multiple moving objects using multiscale MRF with camera motion compensation , 1994, Proceedings of 1st International Conference on Image Processing.

[37]  Yoshinobu Tonomura,et al.  VideoMAP and VideoSpaceIcon: tools for anatomizing video content , 1993, INTERCHI.

[38]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[39]  Jean-Marc Odobez,et al.  Robust Multiresolution Estimation of Parametric Motion Models , 1995, J. Vis. Commun. Image Represent..

[40]  K. Hanna Direct multi-resolution estimation of ego-motion and structure from motion , 1991, Proceedings of the IEEE Workshop on Visual Motion.

[41]  Harpreet S. Sawhney Simplifying motion and structure analysis using planar parallax and image warping , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[42]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[43]  Michael J. Black,et al.  Mixture models for optical flow computation , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[44]  P. Anandan,et al.  Hierarchical Model-Based Motion Estimation , 1992, ECCV.