SIFT Flow: Dense Correspondence across Different Scenes

While image registration has been studied in different areas of computer vision, aligning images depicting different scenes remains a challenging problem, closer to recognition than to image matching. Analogous to optical flow, where an image is aligned to its temporally adjacent frame, we propose SIFT flow, a method to align an image to its neighbors in a large image collection consisting of a variety of scenes. For a query image, histogram intersection on a bag-of-visual-words representation is used to find the set of nearest neighbors in the database. The SIFT flow algorithm then consists of matching densely sampled SIFT features between the two images, while preserving spatial discontinuities. The use of SIFT features allows robust matching across different scene/object appearances and the discontinuity-preserving spatial model allows matching of objects located at different parts of the scene. Experiments show that the proposed approach is able to robustly align complicated scenes with large spatial distortions. We collect a large database of videos and apply the SIFT flow algorithm to two applications: (i) motion field prediction from a single static image and (ii) motion synthesis via transfer of moving objects.

[1]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[2]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[3]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[5]  Jitendra Malik,et al.  Shape Context: A New Descriptor for Shape Matching and Object Recognition , 2000, NIPS.

[6]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  D. Scharstein,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[8]  Martin J. Wainwright,et al.  Exact MAP Estimates by (Hyper)tree Agreement , 2002, NIPS.

[9]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Patrick Pérez,et al.  Poisson image editing , 2003, ACM Trans. Graph..

[11]  Jiří Matas,et al.  Computer Vision - ECCV 2004 , 2004, Lecture Notes in Computer Science.

[12]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[13]  Daniel P. Huttenlocher,et al.  Efficient Belief Propagation for Early Vision , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[14]  Joachim Weickert,et al.  Lucas/Kanade Meets Horn/Schunck: Combining Local and Global Optic Flow Methods , 2005, International Journal of Computer Vision.

[15]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[16]  Jitendra Malik,et al.  Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[18]  Nebojsa Jojic,et al.  LOCUS: learning object classes with unsupervised segmentation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[19]  J. Weickert,et al.  Lucas/Kanade meets Horn/Schunck: combining local and global optic flow methods , 2005 .

[20]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[21]  Trevor Darrell,et al.  Pyramid Match Kernels: Discriminative Classification with Sets of Image Features (version 2) , 2006 .

[22]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[23]  Václav Hlavác,et al.  Efficient MRF Deformation Model for Non-Rigid Image Matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Antonio Torralba,et al.  Object Recognition by Scene Alignment , 2007, NIPS.

[25]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2007, SIGGRAPH 2007.

[26]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[27]  Alexei A. Efros,et al.  Photo clip art , 2007, ACM Trans. Graph..