Segmenting, Modeling, and Matching Video Clips Containing Multiple Moving Objects

This paper presents a novel representation for dynamic scenes composed of multiple rigid objects that may undergo different motions and are observed by a moving camera. Multiview constraints associated with groups of affine-covariant scene patches and a normalized description of their appearance are used to segment a scene into its rigid components, construct three-dimensional models of these components, and match instances of models recovered from different image sequences. The proposed approach has been applied to the detection and matching of moving objects in video sequences and to shot matching, i.e., the identification of shots that depict the same scene in a video clip

[1]  Reinhard Koch,et al.  Self-Calibration and Metric Reconstruction Inspite of Varying and Unknown Intrinsic Camera Parameters , 1999, International Journal of Computer Vision.

[2]  Minerva M. Yeung,et al.  Efficient matching and clustering of video shots , 1995, Proceedings., International Conference on Image Processing.

[3]  C Tomasi,et al.  Shape and motion from image streams: a factorization method. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Daphna Weinshall,et al.  Linear and Incremental Acquisition of Invariant Shape Models From Image Sequences , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Paul A. Beardsley,et al.  3D Model Acquisition from Extended Image Sequences , 1996, ECCV.

[6]  Joseph Y.-T. Leung,et al.  Efficient algorithms for interval graphs and circular-arc graphs , 1982, Networks.

[7]  T. Kanade,et al.  A multi-body factorization method for motion analysis , 1995, ICCV 1995.

[8]  Andrew Zisserman,et al.  Object Level Grouping for Video Shots , 2004, ECCV.

[9]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[10]  David Nister,et al.  Automatic Dense Reconstruction from Uncalibrated Video Sequences , 2001 .

[11]  Jitendra Malik,et al.  Blobworld: A System for Region-Based Image Indexing and Retrieval , 1999, VISUAL.

[12]  S. Shankar Sastry,et al.  Generalized principal component analysis (GPCA) , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[13]  Olivier D. Faugeras,et al.  The geometry of multiple images - the laws that govern the formation of multiple images of a scene and some of their applications , 2001 .

[14]  John R. Kender,et al.  Video Summaries through Mosaic-Based Shot and Scene Clustering , 2002, ECCV.

[15]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[16]  Andrew W. Fitzgibbon,et al.  VHS to VRML: 3D graphical models from video sequences , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[17]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[18]  Martial Hebert,et al.  Provably-convergent iterative methods for projective structure from motion , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[19]  Adam Baumberg,et al.  Reliable feature matching across widely separated views , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[20]  Andrew W. Fitzgibbon,et al.  Automatic 3D model acquisition and generation of new images from video sequences , 1998, 9th European Signal Processing Conference (EUSIPCO 1998).

[21]  Cordelia Schmid,et al.  Evaluation of Interest Point Detectors , 2000, International Journal of Computer Vision.

[22]  Tony Lindeberg,et al.  Shape-Adapted Smoothing in Estimation of 3-D Depth Cues from Affine Distortions of Local 2-D Brightness Structure , 1994, ECCV.

[23]  C. Schmid,et al.  Indexing based on scale invariant interest points , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[24]  Jean Ponce,et al.  On Computing Metric Upgrades of Projective Reconstructions Under the Rectangular Pixel Assumption , 2000, SMILE.

[25]  David G. Lowe,et al.  Local feature view clustering for 3D object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[26]  Andrew Zisserman,et al.  Automated Scene Matching in Movies , 2002, CIVR.

[27]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[28]  Sanjeev R. Kulkarni,et al.  A framework for measuring video similarity and its application to video query by example , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[29]  Cordelia Schmid,et al.  Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[31]  Andrew W. Fitzgibbon,et al.  Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.

[32]  Chong-Wah Ngo,et al.  Recent Advances in Content-Based Video Analysis , 2001, Int. J. Image Graph..

[33]  Andrew W. Fitzgibbon,et al.  On Affine Invariant Clustering and Automatic Cast Listing in Movies , 2002, ECCV.

[34]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[35]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[36]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[37]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[38]  Cordelia Schmid,et al.  3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints , 2006, International Journal of Computer Vision.

[39]  J. Ponce,et al.  Segmenting, modeling, and matching video clips containing multiple moving objects , 2004, CVPR 2004.

[40]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[42]  Andrew Zisserman,et al.  Wide baseline stereo matching , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[43]  Jonas Gårding,et al.  Shape from texture for smooth curved surfaces in perspective projection , 1992, Journal of Mathematical Imaging and Vision.

[44]  Philip H. S. Torr,et al.  Outlier detection and motion segmentation , 1993, Other Conferences.

[45]  T. Boult,et al.  Factorization-based segmentation of motions , 1991, Proceedings of the IEEE Workshop on Visual Motion.

[46]  Maarten Vergauwen,et al.  Structure and motion from image sequences , 2001 .

[47]  Liming Chen,et al.  Video segmentation using 3D hints contained in 2D images , 1996, Other Conferences.

[48]  C. W. Gear,et al.  Multibody Grouping from Motion Images , 1998, International Journal of Computer Vision.

[49]  Glorianna Davenport,et al.  Cinematic primitives for multimedia , 1991, IEEE Computer Graphics and Applications.

[50]  B. S. Manjunath,et al.  NeTra: A toolbox for navigating large image databases , 1997, Multimedia Systems.

[51]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[52]  Rainer Lienhart,et al.  Reliable Transition Detection in Videos: A Survey and Practitioner's Guide , 2001, Int. J. Image Graph..

[53]  Shih-Fu Chang,et al.  A fully automated content-based video search engine supporting spatiotemporal queries , 1998, IEEE Trans. Circuits Syst. Video Technol..

[54]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[55]  Jitendra Malik,et al.  Computing Local Surface Orientation and Shape from Texture for Curved Surfaces , 1997, International Journal of Computer Vision.

[56]  Andrew W. Fitzgibbon,et al.  Multibody Structure and Motion: 3-D Reconstruction of Independently Moving Objects , 2000, ECCV.

[57]  Andrew Zisserman,et al.  Multiple view geometry in computer visiond , 2001 .

[58]  Noel E. O'Connor,et al.  User interface design for keyframe-based browsing of digital video , 2001 .