TS-NET: Combining Modality Specific and Common Features for Multimodal Patch Matching

Multimodal patch matching addresses the problem of finding the correspondences between image patches from two different modalities, e.g. RGB vs sketch or RGB vs near-infrared. The comparison of patches of different modalities can be done by discovering the information common to both modalities (Siamese like approaches) or the modality-specific information (Pseudo-Siamese like approaches). We observed that none of these two scenarios is optimal. This motivates us to propose a three-stream architecture, dubbed as TS-Net, combining the benefits of the two. In addition, we show that adding extra constraints in the intermediate layers of such networks further boosts the performance. Experimentations on three multimodal datasets show significant performance gains in comparison with Siamese and Pseudo-Siamese networks† †Codes and resources available at http://github.com/ensv/TS-Net.

[1]  Inderjit S. Dhillon,et al.  Metric and Kernel Learning Using a Linear Transformation , 2009, J. Mach. Learn. Res..

[2]  Cristhian Aguilera,et al.  Cross-Spectral Local Descriptors via Quadruplet Network , 2017, Sensors.

[3]  Bin Fan,et al.  L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Cristhian Aguilera,et al.  Learning Cross-Spectral Similarity Measures with Deep Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[5]  Pascal Fua,et al.  Learning to Match Aerial Images with Deep Attentive Architectures , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Yann LeCun,et al.  Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches , 2015, J. Mach. Learn. Res..

[7]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[9]  Raquel Urtasun,et al.  Exploiting Deep Matching and SAR Data for the Geo-Localization Accuracy Improvement of Optical Satellite Images , 2017, Remote. Sens..

[10]  Horst Bischof,et al.  Learned local descriptors for recognition and matching , 2008 .

[11]  Rahul Sukthankar,et al.  MatchNet: Unifying feature and metric learning for patch-based matching , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Angel D. Sappa,et al.  Cross-spectral image patch similarity using convolutional neural network , 2017, 2017 IEEE International Workshop of Electronics, Control, Measurement, Signals and their Application to Mechatronics (ECMSM).

[13]  Gang Hua,et al.  Discriminative Learning of Local Image Descriptors , 1990, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Jan-Michael Frahm,et al.  Comparative Evaluation of Binary Features , 2012, ECCV.

[15]  Guillermo Sapiro,et al.  Not Afraid of the Dark: NIR-VIS Face Recognition via Cross-Spectral Hallucination and Low-Rank Embedding , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Vincent Lepetit,et al.  DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Gustavo Carneiro,et al.  Learning Local Image Descriptors with Deep Siamese and Triplet Convolutional Networks by Minimizing Global Loss Functions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[19]  Krystian Mikolajczyk,et al.  PN-Net: Conjoined Triple Deep Network for Learning Local Image Descriptors , 2016, ArXiv.

[20]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).