Semantic Correspondence: A Hierarchical Approach

Establishing semantic correspondence across images when the objects in the images have undergone complex deformations remains a challenging task in the field of computer vision. In this paper, we propose a hierarchical method to tackle this problem by first semantically targeting the foreground objects to localize the search space and then looking deeply into multiple levels of the feature representation to search for point-level correspondence. In contrast to existing approaches, which typically penalize large discrepancies, our approach allows for significant displacements, with the aim to accommodate large deformations of the objects in scene. Localizing the search space by semantically matching object-level correspondence, our method robustly handles large deformations of objects. Representing the target region by concatenated hypercolumn features which take into account the hierarchical levels of the surrounding context, helps to clear the ambiguity to further improve the accuracy. By conducting multiple experiments across scenes with non-rigid objects, we validate the proposed approach, and show that it outperforms the state of the art methods for semantic correspondence establishment.

[1]  Silvio Savarese,et al.  Universal Correspondence Network , 2016, NIPS.

[2]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Richard Szeliski,et al.  Building Rome in a day , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Yong Jae Lee,et al.  FlowWeb: Joint image set alignment by weaving consistent, pixel-wise correspondences , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  M. Bernardine Dias,et al.  The Dynamic Hungarian Algorithm for the Assignment Problem with Changing Costs , 2007 .

[6]  Joachim Weickert,et al.  Illumination-Robust Variational Optical Flow with Photometric Invariants , 2007, DAGM-Symposium.

[7]  Pietro Perona,et al.  Caltech-UCSD Birds 200 , 2010 .

[8]  Gang Hua,et al.  Discriminative Learning of Local Image Descriptors , 1990, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[10]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Cordelia Schmid,et al.  DeepMatching: Hierarchical Deformable Dense Matching , 2015, International Journal of Computer Vision.

[14]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[15]  Thomas Brox,et al.  High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[16]  Simon Lucey,et al.  Dense Semantic Correspondence Where Every Pixel is a Classifier , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[18]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Sebastian Thrun,et al.  The Correlated Correspondence Algorithm for Unsupervised Registration of Nonrigid Surfaces , 2004, NIPS.

[20]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[21]  Richard Szeliski,et al.  A Comparative Study of Energy Minimization Methods for Markov Random Fields with Smoothness-Based Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[23]  Trevor Darrell,et al.  Do Convnets Learn Correspondence? , 2014, NIPS.

[24]  Quoc V. Le A Tutorial on Deep Learning Part 2: Autoencoders, Convolutional Neural Networks and Recurrent Neural Networks , 2015 .

[25]  Lubomir D. Bourdev Poselets and Their Applications in High-Level Computer Vision , 2011 .

[26]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  David W. Jacobs,et al.  WarpNet: Weakly Supervised Matching for Single-View Reconstruction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Ce Liu,et al.  Deformable Spatial Pyramid Matching for Fast Dense Correspondences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Bernd Girod,et al.  Feature Matching Performance of Compact Descriptors for Visual Search , 2014, 2014 Data Compression Conference.