Robust Image Matching By Dynamic Feature Selection

Estimating dense correspondences between images is a long-standing image under-standing task. Recent works introduce convolutional neural networks (CNNs) to extract high-level feature maps and find correspondences through feature matching. However,high-level feature maps are in low spatial resolution and therefore insufficient to provide accurate and fine-grained features to distinguish intra-class variations for correspondence matching. To address this problem, we generate robust features by dynamically selecting features at different scales. To resolve two critical issues in feature selection,i.e.,how many and which scales of features to be selected, we frame the feature selection process as a sequential Markov decision-making process (MDP) and introduce an optimal selection strategy using reinforcement learning (RL). We define an RL environment for image matching in which each individual action either requires new features or terminates the selection episode by referring a matching score. Deep neural networks are incorporated into our method and trained for decision making. Experimental results show that our method achieves comparable/superior performance with state-of-the-art methods on three benchmarks, demonstrating the effectiveness of our feature selection strategy.

[1]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[2]  Adam Finkelstein,et al.  PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, SIGGRAPH 2009.

[3]  Jean Ponce,et al.  SCNet: Learning Semantic Correspondence , 2017, ICCV.

[4]  Stephen Lin,et al.  Recurrent Transformer Networks for Semantic Correspondence , 2018, NeurIPS.

[5]  Jean Ponce,et al.  Hyperpixel Flow: Semantic Correspondence With Multi-Layer Neural Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[7]  Ce Liu,et al.  Deformable Spatial Pyramid Matching for Fast Dense Correspondences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[9]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Different Scenes , 2008, ECCV.

[10]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Wenhao Wu,et al.  Multi-Agent Reinforcement Learning Based Frame Sampling for Effective Untrimmed Video Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Antonio Torralba,et al.  SIFT Flow: Dense Correspondence across Scenes and Its Applications , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Xiang Li,et al.  Building-A-Nets: Robust Building Extraction From High-Resolution Remote Sensing Images With Adversarial Networks , 2018, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[14]  Xiang Li,et al.  Arbicon-Net: Arbitrary Continuous Geometric Transformation Networks for Image Registration , 2019, NeurIPS.

[15]  Simon Lucey,et al.  Dense Semantic Correspondence Where Every Pixel is a Classifier , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Tomás Pevný,et al.  Classification with Costly Features using Deep Reinforcement Learning , 2019, AAAI.

[17]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[18]  Karthik Ramani,et al.  Using least median of squares for structural superposition of flexible proteins , 2009, BMC Bioinformatics.

[19]  Cordelia Schmid,et al.  Proposal Flow: Semantic Correspondences from Object Proposals , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Seungryong Kim,et al.  FCSS: Fully Convolutional Self-Similarity for Dense Semantic Correspondence , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Cristian Sminchisescu,et al.  Deep Reinforcement Learning of Region Proposal Networks for Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Cordelia Schmid,et al.  Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Bohyung Han,et al.  Attentive Semantic Alignment with Offset-Aware Correlation Kernels , 2018, ECCV.

[24]  Tomás Pajdla,et al.  Neighbourhood Consensus Networks , 2018, NeurIPS.

[25]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[26]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[27]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[28]  Yi Fang,et al.  Siamese CNN-BiLSTM Architecture for 3D Shape Representation Learning , 2018, IJCAI.

[29]  Jiwen Lu,et al.  3DCNN-DQN-RNN: A Deep Reinforcement Learning Framework for Semantic Parsing of Large-Scale 3D Point Clouds , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Jean Ponce,et al.  A graph-matching kernel for object categorization , 2011, 2011 International Conference on Computer Vision.

[31]  Cordelia Schmid,et al.  DeepMatching: Hierarchical Deformable Dense Matching , 2015, International Journal of Computer Vision.

[32]  Josef Sivic,et al.  End-to-End Weakly-Supervised Semantic Alignment , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Meng Wang,et al.  Learned Binary Spectral Shape Descriptor for 3D Shape Correspondence , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Shuguang Cui,et al.  Deep Reinforcement Learning of Volume-Guided Progressive View Inpainting for 3D Point Scene Completion From a Single Depth Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Sang Chul Ahn,et al.  Generalized Deformable Spatial Pyramid: Geometry-preserving dense correspondence estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Fan Yang,et al.  Object-Aware Dense Semantic Correspondence , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Xiaojun Chang,et al.  Reinforcement Cutting-Agent Learning for Video Object Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Silvio Savarese,et al.  Universal Correspondence Network , 2016, NIPS.

[39]  Jiwen Lu,et al.  Collaborative Deep Reinforcement Learning for Multi-object Tracking , 2018, ECCV.

[40]  Josef Sivic,et al.  Convolutional Neural Network Architecture for Geometric Matching , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Richard Szeliski,et al.  Building Rome in a day , 2009, 2009 IEEE 12th International Conference on Computer Vision.