SIFT Flow: Dense Correspondence across Scenes and Its Applications

While image alignment has been studied in different areas of computer vision for decades, aligning images depicting different scenes remains a challenging problem. Analogous to optical flow, where an image is aligned to its temporally adjacent frame, we propose SIFT flow, a method to align an image to its nearest neighbors in a large image corpus containing a variety of scenes. The SIFT flow algorithm consists of matching densely sampled, pixelwise SIFT features between two images while preserving spatial discontinuities. The SIFT features allow robust matching across different scene/object appearances, whereas the discontinuity-preserving spatial model allows matching of objects located at different parts of the scene. Experiments show that the proposed approach robustly aligns complex scene pairs containing significant spatial differences. Based on SIFT flow, we propose an alignment-based large database framework for image analysis and synthesis, where image information is transferred from the nearest neighbors to a query image according to the dense scene correspondence. This framework is demonstrated through concrete applications such as motion field prediction from a single image, motion synthesis via object transfer, satellite image registration, and face recognition.

[1]  Gebräuchliche Fertigarzneimittel,et al.  V , 1893, Therapielexikon Neurologie.

[2]  Marsha Jo Hannah,et al.  Computer matching of areas in stereo images. , 1974 .

[3]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[4]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[5]  W. Eric L. Grimson,et al.  Computational Experiments with a Feature Based Stereo Algorithm , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[7]  David J. Fleet,et al.  Phase-based disparity measurement , 1991, CVGIP Image Underst..

[8]  Jitendra Malik,et al.  A Computational Framework for Determining Stereo Correspondence from a Set of Linear Spatial Filters , 1991, ECCV.

[9]  P. Anandan,et al.  Hierarchical Model-Based Motion Estimation , 1992, ECCV.

[10]  Jitendra Malik,et al.  Computational framework for determining stereo correspondence from a set of linear spatial filters , 1992, Image Vis. Comput..

[11]  Andy Harter,et al.  Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[12]  Rosalind W. Picard,et al.  Texture orientation for sorting photos "at a glance" , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[13]  Yair Weiss,et al.  Interpreting Images by Propagating Bayesian Beliefs , 1996, NIPS.

[14]  Michael J. Black,et al.  The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields , 1996, Comput. Vis. Image Underst..

[15]  Yair Weiss,et al.  Smoothness in layers: Motion segmentation using nonparametric mixture estimation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[17]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[18]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[19]  William T. Freeman,et al.  Learning low-level vision , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[20]  Jitendra Malik,et al.  Shape Context: A New Descriptor for Shape Matching and Object Recognition , 2000, NIPS.

[21]  Vladimir Kolmogorov,et al.  Computing visual correspondence with occlusions using graph cuts , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[22]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  D. Scharstein,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, Proceedings IEEE Workshop on Stereo and Multi-Baseline Vision (SMBV 2001).

[24]  Nanning Zheng,et al.  Stereo Matching Using Belief Propagation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Jitendra Malik,et al.  Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[27]  Patrick Pérez,et al.  Poisson image editing , 2003, ACM Trans. Graph..

[28]  Cordelia Schmid,et al.  Evaluation of Interest Point Detectors , 2000, International Journal of Computer Vision.

[29]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[30]  William T. Freeman,et al.  Learning Low-Level Vision , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[31]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[32]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[33]  Paul A. Viola,et al.  Alignment by Maximization of Mutual Information , 1997, International Journal of Computer Vision.

[34]  Daniel P. Huttenlocher,et al.  Efficient Belief Propagation for Early Vision , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[35]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[36]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.

[37]  Jitendra Malik,et al.  Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[38]  David J. Fleet,et al.  Performance of optical flow techniques , 1994, International Journal of Computer Vision.

[39]  Luc Van Gool,et al.  Real-time connectivity constrained depth map computation using programmable graphics hardware , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[40]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[41]  Nebojsa Jojic,et al.  LOCUS: learning object classes with unsupervised segmentation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[42]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[43]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[44]  J. Weickert,et al.  Lucas/Kanade meets Horn/Schunck: combining local and global optic flow methods , 2005 .

[45]  Andrew Blake,et al.  Cosegmentation of Image Pairs by Histogram Matching - Incorporating a Global Constraint into MRFs , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[46]  Edward H. Adelson,et al.  Analysis of Contour Motions , 2006, NIPS.

[47]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[48]  Trevor Darrell,et al.  Pyramid Match Kernels: Discriminative Classification with Sets of Image Features (version 2) , 2006 .

[49]  Chia-Ling Tsai,et al.  Registration of Challenging Image Pairs: Initialization, Estimation, and Decision , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[51]  Václav Hlavác,et al.  Efficient MRF Deformation Model for Non-Rigid Image Matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Antonio Torralba,et al.  Object Recognition by Scene Alignment , 2007, NIPS.

[53]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2007, SIGGRAPH 2007.

[54]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[55]  Shai Avidan,et al.  Ensemble Tracking , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Yuxiao Hu,et al.  Learning a Spatially Smooth Subspace for Face Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Edward H. Adelson,et al.  Human-assisted motion annotation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[58]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[59]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Different Scenes , 2008, ECCV.

[60]  Richard Szeliski,et al.  A Comparative Study of Energy Minimization Methods for Markov Random Fields with Smoothness-Based Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  C. Bregler,et al.  Large displacement optical flow , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Antonio Torralba,et al.  Nonparametric scene parsing: Label transfer via dense scene alignment , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[63]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.