Multimodal matching using a Hybrid Convolutional Neural Network

In this work we propose a novel Convolutional Neural Network (CNN) architecture for the matching of pairs of image patches acquired by different sensors. Our approach utilizes two CNN sub-networks, where the first is a Siamese CNN and the second is a subnetwork consisting of dual non-weight-sharing CNNs. This allows simultaneous joint and disjoint processing of the input pair of multimodal image patches. The convergence of the training and the test accuracy is improved by introducing auxiliary losses, and a corresponding hard negative mining scheme. The proposed approach is experimentally shown to compare favorably with contemporary state-of-the-art schemes when applied to multiple datasets of multimodal images. The code implementing the proposed scheme was made publicly available.

[1]  Cristhian Aguilera,et al.  Cross-Spectral Local Descriptors via Quadruplet Network , 2017, Sensors.

[2]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Sabine Süsstrunk,et al.  Multi-spectral SIFT for scene category recognition , 2011, CVPR 2011.

[4]  Natasha Gelfand,et al.  A survey of image retargeting techniques , 2010, Optical Engineering + Applications.

[5]  Shuang Wang,et al.  Cross-Spectral Image Patch Matching by Learning Features of the Spatially Connected Patches in a Shared Space , 2018, ACCV.

[6]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Amit R.Sharma,et al.  Face Photo-Sketch Synthesis and Recognition , 2012 .

[8]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[9]  Ning Huyan,et al.  AFD-Net: Aggregated Feature Difference Learning for Cross-Spectral Image Patch Matching , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[11]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  David J. Fleet,et al.  VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , 2017, BMVC.

[13]  Hyojin Kim,et al.  Dude (Duality descriptor): A robust descriptor for disparate images using line segment duality , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[14]  Silvio Savarese,et al.  Universal Correspondence Network , 2016, NIPS.

[15]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[16]  Paul A. Viola,et al.  Alignment by Maximization of Mutual Information , 1997, International Journal of Computer Vision.

[17]  Richard Szeliski,et al.  Building Rome in a day , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  P. Anandan,et al.  Robust multi-sensor image alignment , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[19]  Frédéric Jurie,et al.  TS-NET: Combining Modality Specific and Common Features for Multimodal Patch Matching , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[20]  Andrei Z. Broder,et al.  Identifying and Filtering Near-Duplicate Documents , 2000, CPM.

[21]  Nikos Paragios,et al.  Deformable Medical Image Registration: A Survey , 2013, IEEE Transactions on Medical Imaging.

[22]  Cristhian Aguilera,et al.  Learning Cross-Spectral Similarity Measures with Deep Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[23]  Rui Ma,et al.  MI-SIFT: mirror and inversion invariant generalization for SIFT descriptor , 2010, CIVR '10.

[24]  Eli Shechtman,et al.  Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[26]  Jan Flusser,et al.  Image registration methods: a survey , 2003, Image Vis. Comput..

[27]  Torsten Sattler,et al.  D2-Net: A Trainable CNN for Joint Detection and Description of Local Features , 2019, CVPR 2019.

[28]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[29]  Amir Averbuch,et al.  Multisensor image registration via implicit similarity , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Mark R. Pickering,et al.  Modified SIFT for multi-modal remote sensing image registration , 2012, 2012 IEEE International Geoscience and Remote Sensing Symposium.

[31]  Frédéric Jurie,et al.  Vehicle detection in aerial imagery : A small target detection benchmark , 2016, J. Vis. Commun. Image Represent..

[32]  Jie Tian,et al.  Real-time multi-modal rigid registration based on a novel symmetric-SIFT descriptor , 2009 .

[33]  Vincent Lepetit,et al.  DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Angel Domingo Sappa,et al.  Multispectral Image Feature Points , 2012, Sensors.

[35]  Guojun Lu,et al.  Improved Symmetric-SIFT for Multi-modal Image Registration , 2011, 2011 International Conference on Digital Image Computing: Techniques and Applications.

[36]  Angel Domingo Sappa,et al.  LGHD: A feature descriptor for matching across non-linear intensity variations , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[37]  Yannis Avrithis,et al.  Local Features and Visual Words Emerge in Activations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[39]  Hui A Contour-Based Approach to Multisensor Image Registration , 1995 .

[40]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[41]  Tony Lindeberg,et al.  Detecting salient blob-like image structures and their scales with a scale-space primal sketch: A method for focus-of-attention , 1993, International Journal of Computer Vision.

[42]  Iasonas Kokkinos,et al.  Discriminative Learning of Deep Convolutional Feature Point Descriptors , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[43]  Minh N. Do,et al.  DASC: Dense adaptive self-correlation descriptor for multi-modal and multi-spectral correspondence , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Jiri Matas,et al.  Working hard to know your neighbor's margins: Local descriptor learning loss , 2017, NIPS.