Alignment by Composition

We propose an unsupervised method to establish dense semantic correspondences between images depicting different instances of the same object category. We posit that alignment is compositional in nature and requires the detection of a similar visual concept between images. We realize this in a top-down fashion using objectness, saliency, and visual similarity cues to co-localize the regions of holistic foreground objects. Jointly maximizing visual similarity and bounding the geometric distortion induced by their configuration, the target foreground object is then composed by the subregions of the source foreground object. The resultant composition is used to form a dense motion field enabling the alignment. Experimental results on several benchmark datasets support the efficacy of the proposed method.

[1]  Yi Yang,et al.  Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Jitendra Malik,et al.  Geometric blur for template matching , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[3]  Yoichi Sato,et al.  Joint Recovery of Dense Correspondence and Cosegmentation in Two Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Cordelia Schmid,et al.  Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jiangbo Lu,et al.  DAISY Filter Flow: A Generalized Discrete Approach to Dense Correspondences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[8]  Sang Chul Ahn,et al.  Generalized Deformable Spatial Pyramid: Geometry-preserving dense correspondence estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Fan Yang,et al.  Object-Aware Dense Semantic Correspondence , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jitendra Malik,et al.  Discriminative Decorrelation for Clustering and Classification , 2012, ECCV.

[11]  Björn Ommer,et al.  Deep Semantic Feature Matching , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Dani Lischinski,et al.  Non-rigid dense correspondence with applications for image enhancement , 2011, ACM Trans. Graph..

[13]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[14]  Ce Liu,et al.  Depth Extraction from Video Using Non-parametric Sampling , 2012, ECCV.

[15]  Ce Liu,et al.  Deformable Spatial Pyramid Matching for Fast Dense Correspondences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Benjamin B. Kimia,et al.  Subpixel Semantic Flow , 2017, BMVC.

[17]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Stephen Lin,et al.  FCSS: Fully Convolutional Self-Similarity for Dense Semantic Correspondence , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Ronen Basri,et al.  Feature Matching with Bounded Distortion , 2014, ACM Trans. Graph..

[20]  Yizhou Yu,et al.  Visual saliency based on multiscale deep features , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Trevor Darrell,et al.  Do Convnets Learn Correspondence? , 2014, NIPS.

[22]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[23]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[24]  Adam Finkelstein,et al.  The Generalized PatchMatch Correspondence Algorithm , 2010, ECCV.

[25]  Cordelia Schmid,et al.  Proposal Flow: Semantic Correspondences from Object Proposals , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Antonio Torralba,et al.  Nonparametric Scene Parsing via Label Transfer , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Zhuowen Tu,et al.  Scale-Space SIFT flow , 2014, IEEE Winter Conference on Applications of Computer Vision.

[28]  Stephen Lin,et al.  DCTM: Discrete-Continuous Transformation Matching for Semantic Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Edgar Chávez,et al.  Local Search Methods for Fast Near Neighbor Search , 2017, ArXiv.

[30]  Josef Sivic,et al.  Convolutional Neural Network Architecture for Geometric Matching , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Alexei A. Efros,et al.  Learning Dense Correspondence via 3D-Guided Cycle Consistency , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Richard Szeliski,et al.  A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms , 2001, International Journal of Computer Vision.