Segmenting, modeling, and matching video clips containing multiple moving objects

This paper presents a novel representation for dynamic scenes composed of multiple rigid objects that may undergo different motions and are observed by a moving camera. Multiview constraints associated with groups of affine-covariant scene patches and a normalized description of their appearance are used to segment a scene into its rigid components, construct three-dimensional models of these components, and match instances of models recovered from different image sequences. The proposed approach has been applied to the detection and matching of moving objects in video sequences and to shot matching, i.e., the identification of shots that depict the same scene in a video clip

[1]  Jitendra Malik,et al.  Computing Local Surface Orientation and Shape from Texture for Curved Surfaces , 1997, International Journal of Computer Vision.

[2]  S. Shankar Sastry,et al.  Generalized principal component analysis (GPCA) , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Jonas Gårding,et al.  Shape from texture for smooth curved surfaces in perspective projection , 1992, Journal of Mathematical Imaging and Vision.

[5]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[6]  Philip H. S. Torr,et al.  Outlier detection and motion segmentation , 1993, Other Conferences.

[7]  Martial Hebert,et al.  Provably-convergent iterative methods for projective structure from motion , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[8]  Daphna Weinshall,et al.  Linear and incremental acquisition of invariant shape models from image sequences , 1993, 1993 (4th) International Conference on Computer Vision.

[9]  Andrew W. Fitzgibbon,et al.  Multibody Structure and Motion: 3-D Reconstruction of Independently Moving Objects , 2000, ECCV.

[10]  Glorianna Davenport,et al.  Cinematic primitives for multimedia , 1991, IEEE Computer Graphics and Applications.

[11]  T. Boult,et al.  Factorization-based segmentation of motions , 1991, Proceedings of the IEEE Workshop on Visual Motion.

[12]  Cordelia Schmid,et al.  Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Takeo Kanade,et al.  A Paraperspective Factorization Method for Shape and Motion Recovery , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[15]  Maarten Vergauwen,et al.  Structure and motion from image sequences , 2001 .

[16]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[17]  Andrew W. Fitzgibbon,et al.  Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.

[18]  Takeo Kanade,et al.  A multi-body factorization method for motion analysis , 1995, Proceedings of IEEE International Conference on Computer Vision.

[19]  Chong-Wah Ngo,et al.  Recent Advances in Content-Based Video Analysis , 2001, Int. J. Image Graph..

[20]  Shih-Fu Chang,et al.  A fully automated content-based video search engine supporting spatiotemporal queries , 1998, IEEE Trans. Circuits Syst. Video Technol..

[21]  Jitendra Malik,et al.  Tracking people with twists and exponential maps , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[22]  David E. DiFranco Recovery of 3 D Articulated Motion from 2 D Correspondences , 1999 .

[23]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[24]  Rainer Lienhart,et al.  Reliable Transition Detection in Videos: A Survey and Practitioner's Guide , 2001, Int. J. Image Graph..

[25]  Takeo Kanade,et al.  A Paraperspective Factorization Method for Shape and Motion Recovery , 1994, ECCV.

[26]  C Tomasi,et al.  Shape and motion from image streams: a factorization method. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[27]  B. S. Manjunath,et al.  NeTra: A toolbox for navigating large image databases , 1997, Multimedia Systems.

[28]  Andrew E. Johnson,et al.  Surface matching for object recognition in complex three-dimensional scenes , 1998, Image Vis. Comput..

[29]  Paul A. Beardsley,et al.  3D Model Acquisition from Extended Image Sequences , 1996, ECCV.

[30]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[31]  Cordelia Schmid,et al.  3D object modeling and recognition using affine-invariant patches and multi-view spatial constraints , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[32]  PollefeysMarc,et al.  Self-Calibration and Metric Reconstruction Inspite of Varying and Unknown Intrinsic Camera Parameters , 1999 .

[33]  Patrick Bouthemy,et al.  Spatio-Temporal Segmentation and General Motion Characterization for Video Indexing and Retrieval , 1999 .

[34]  Liming Chen,et al.  Video segmentation using 3D hints contained in 2D images , 1996, Other Conferences.

[35]  Cordelia Schmid,et al.  Indexing Based on Scale Invariant Interest Points , 2001, ICCV.

[36]  Cordelia Schmid,et al.  3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints , 2006, International Journal of Computer Vision.

[37]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[38]  M. Hebert,et al.  The Representation, Recognition, and Locating of 3-D Objects , 1986 .

[39]  Andrew Zisserman,et al.  Object Level Grouping for Video Shots , 2004, International Journal of Computer Vision.

[40]  Emanuele Trucco,et al.  Geometric Invariance in Computer Vision , 1995 .

[41]  Jitendra Malik,et al.  Matching Shapes , 2001, ICCV.

[42]  Tony Lindeberg,et al.  Shape-adapted smoothing in estimation of 3-D shape cues from affine deformations of local 2-D brightness structure , 1997, Image Vis. Comput..

[43]  Andrew Zisserman,et al.  Wide baseline stereo matching , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[44]  Adam Baumberg,et al.  Reliable feature matching across widely separated views , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[45]  Wolfgang Spohn,et al.  The Representation of , 1986 .

[46]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[47]  Sanjeev R. Kulkarni,et al.  A framework for measuring video similarity and its application to video query by example , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[48]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[49]  Andrew Zisserman,et al.  Applications of Invariance in Computer Vision , 1993, Lecture Notes in Computer Science.

[50]  J.B. Burns,et al.  View Variation of Point-Set and Line-Segment Features , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Zhengyou Zhang,et al.  Token tracking in a cluttered scene , 1994, Image Vis. Comput..

[52]  Jean Ponce,et al.  On Computing Metric Upgrades of Projective Reconstructions Under the Rectangular Pixel Assumption , 2000, SMILE.

[53]  David G. Lowe,et al.  Local feature view clustering for 3D object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[54]  Andrew W. Fitzgibbon,et al.  Automatic 3D model acquisition and generation of new images from video sequences , 1998, 9th European Signal Processing Conference (EUSIPCO 1998).

[55]  Andrew Zisserman,et al.  Automated Scene Matching in Movies , 2002, CIVR.

[56]  Cordelia Schmid,et al.  Evaluation of Interest Point Detectors , 2000, International Journal of Computer Vision.

[57]  Tony Lindeberg,et al.  Shape-Adapted Smoothing in Estimation of 3-D Depth Cues from Affine Distortions of Local 2-D Brightness Structure , 1994, ECCV.

[58]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[59]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[60]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Ullas Gargi,et al.  Performance characterization of video-shot-change detection methods , 2000, IEEE Trans. Circuits Syst. Video Technol..

[62]  John R. Kender,et al.  Video Summaries through Mosaic-Based Shot and Scene Clustering , 2002, ECCV.

[63]  O. Faugeras,et al.  The Geometry of Multiple Images , 1999 .

[64]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[65]  Andrew W. Fitzgibbon,et al.  VHS to VRML: 3D graphical models from video sequences , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[66]  Andrew W. Fitzgibbon,et al.  On Affine Invariant Clustering and Automatic Cast Listing in Movies , 2002, ECCV.

[67]  Daphna Weinshall,et al.  Linear and Incremental Acquisition of Invariant Shape Models From Image Sequences , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[68]  David Nister,et al.  Automatic Dense Reconstruction from Uncalibrated Video Sequences , 2001 .

[69]  Jitendra Malik,et al.  Blobworld: A System for Region-Based Image Indexing and Retrieval , 1999, VISUAL.

[70]  Noel E. O'Connor,et al.  User interface design for keyframe-based browsing of digital video , 2001 .

[71]  Minerva M. Yeung,et al.  Efficient matching and clustering of video shots , 1995, Proceedings., International Conference on Image Processing.

[72]  Joseph Y.-T. Leung,et al.  Efficient algorithms for interval graphs and circular-arc graphs , 1982, Networks.

[73]  David Edward DiFranco,et al.  Recovery of 3D articulated motion from 2D correspondences , 2000 .

[74]  Cordelia Schmid,et al.  Structuration de vidéos pour des interfaces de consultation avancées , 1998 .

[75]  Reinhard Koch,et al.  Self-calibration and metric reconstruction in spite of varying and unknown internal camera parameters , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).