Automated location matching in movies

We describe progress in matching shots which are images of the same 3D location in a film. The problem is hard because the camera viewpoint may change substantially between shots, with consequent changes in the imaged appearance of the scene due to foreshortening, scale changes, partial occlusion and lighting changes. We develop and compare two methods which achieve this task. In the first method we match key frames between shots using wide baseline matching techniques. The wide baseline method represents each frame by a set of viewpoint covariant local features. The local spatial support of the features means that segmentation of the frame (e.g., into foreground/background) is not required, and partial occlusion is tolerated. Matching proceeds through a series of stages starting with indexing based on a viewpoint invariant description of the features, then employing semi-local constraints (such as spatial consistency) and finally global constraints (such as epipolar geometry). In the second method the temporal continuity within a shot is used to compute invariant descriptors for tracked features, and these descriptors are the basic matching unit. The temporal information increases both the signal-to-noise ratio of the data and the stability of the computed features. We develop analogues of local spatial consistency, cross-correlation, and epipolar geometry for these tracks. Results of matching shots for a number of very different scene types are illustrated on two entire commercial films.

[1]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[2]  Jiri Matas,et al.  Unifying view for wide-baseline stereo , 2001 .

[3]  Andrew Zisserman,et al.  Matching and Reconstruction from Widely Separated Views , 1998, SMILE.

[4]  Rainer Lienhart,et al.  Reliable Transition Detection in Videos: A Survey and Practitioner's Guide , 2001, Int. J. Image Graph..

[5]  Luc Van Gool,et al.  Content-Based Image Retrieval Based on Local Affinely Invariant Regions , 1999, VISUAL.

[6]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[7]  John R. Kender,et al.  Video Summaries through Mosaic-Based Shot and Scene Clustering , 2002, ECCV.

[8]  A. Murat Tekalp,et al.  Group-of-frames/pictures color histogram descriptors for multimedia applications , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[9]  Jiri Matas,et al.  Distinguished Regions for Wide-baseline Stereo , 2001 .

[10]  David G. Lowe,et al.  Local feature view clustering for 3D object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[11]  Andrew Zisserman,et al.  Automated Scene Matching in Movies , 2002, CIVR.

[12]  Rachid Deriche,et al.  A Robust Technique for Matching two Uncalibrated Images Through the Recovery of the Unknown Epipolar Geometry , 1995, Artif. Intell..

[13]  Luc Van Gool,et al.  Wide Baseline Stereo Matching based on Local, Affinely Invariant Regions , 2000, BMVC.

[14]  Cordelia Schmid,et al.  Indexing Based on Scale Invariant Interest Points , 2001, ICCV.

[15]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[16]  Stefan Carlsson,et al.  Wide Baseline Point Matching Using Affine Invariants Computed from Intensity Profiles , 2000, ECCV.

[17]  Tony Lindeberg,et al.  Shape-Adapted Smoothing in Estimation of 3-D Depth Cues from Affine Distortions of Local 2-D Brightness Structure , 1994, ECCV.

[18]  Philip H. S. Torr,et al.  The Development and Comparison of Robust Methods for Estimating the Fundamental Matrix , 1997, International Journal of Computer Vision.

[19]  Andrew Zisserman,et al.  Multiple view geometry in computer visiond , 2001 .

[20]  Tony Lindeberg,et al.  Shape-adapted smoothing in estimation of 3-D shape cues from affine deformations of local 2-D brightness structure , 1997, Image Vis. Comput..

[21]  Patrick Bouthemy,et al.  Determining a Structured Spatio-Temporal Representation of Video Content for Efficient Visualization and Indexing , 1998, ECCV.

[22]  Adam Baumberg,et al.  Reliable feature matching across widely separated views , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[23]  Reinhard Koch,et al.  3D Structure from Multiple Images of Large-Scale Environments , 1998, Lecture Notes in Computer Science.

[24]  Andrew Zisserman,et al.  Wide baseline stereo matching , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[25]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[26]  Stefan Carlsson,et al.  Combining Appearance and Topology for Wide Baseline Matching , 2002, ECCV.

[27]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[28]  Andrew Zisserman,et al.  Viewpoint invariant texture matching and wide baseline stereo , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[29]  Andrew Zisserman,et al.  Multi-view Matching for Unordered Image Sets, or "How Do I Organize My Holiday Snaps?" , 2002, ECCV.

[30]  Noel E. O'Connor,et al.  User interface design for keyframe-based browsing of digital video , 2001 .

[31]  Jiri Matas,et al.  Object Recognition using the Invariant Pixel-Set Signature , 2000, BMVC.

[32]  Cordelia Schmid,et al.  Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..