A Time Sequence Images Matching Method Based on the Siamese Network

The similar analysis of time sequence images to achieve image matching is a foundation of tasks in dynamic environments, such as multi-object tracking and dynamic gesture recognition. Therefore, we propose a matching method of time sequence images based on the Siamese network. Inspired by comparative learning, two different comparative parts are designed and embedded in the network. The first part makes a comparison between the input image pairs to generate the correlation matrix. The second part compares the correlation matrix, which is the output of the first comparison part, with a template, in order to calculate the similarity. The improved loss function is used to constrain the image matching and similarity calculation. After experimental verification, we found that it not only performs better, but also has some ability to estimate the camera pose.

[1]  Chee Sun Won,et al.  Key-point based stereo matching and its application to interpolations , 2017, Multidimens. Syst. Signal Process..

[2]  Davide Chicco,et al.  Siamese Neural Networks: An Overview , 2021, Artificial Neural Networks, 3rd Edition.

[3]  Dinggang Shen,et al.  Image registration by local histogram matching , 2007, Pattern Recognit..

[4]  Song Wang,et al.  Learning Dynamic Siamese Network for Visual Object Tracking , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[6]  Yu Lei,et al.  An Improved ORB Algorithm of Extracting and Matching Features , 2015 .

[7]  Esa Rahtu,et al.  Siamese network features for image matching , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[8]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Wei Wu,et al.  Distractor-aware Siamese Networks for Visual Object Tracking , 2018, ECCV.

[10]  Chong Luo,et al.  A Twofold Siamese Network for Real-Time Object Tracking , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Junjun Jiang,et al.  Image Matching from Handcrafted to Deep Features: A Survey , 2020, International Journal of Computer Vision.

[12]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[13]  Jan Flusser,et al.  Image registration methods: a survey , 2003, Image Vis. Comput..

[14]  Silvio Savarese,et al.  Learning to Track at 100 FPS with Deep Regression Networks , 2016, ECCV.

[15]  Yann LeCun,et al.  Barlow Twins: Self-Supervised Learning via Redundancy Reduction , 2021, ICML.

[16]  Wei Liu,et al.  Occlusion Robust Face Recognition Based on Mask Learning With Pairwise Differential Siamese Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Luca Bertinetto,et al.  End-to-End Representation Learning for Correlation Filter Based Tracking , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[19]  Zhenhua Guo,et al.  Rotation invariant texture classification using LBP variance (LBPV) with global matching , 2010, Pattern Recognit..

[20]  Maria Kyrarini,et al.  Image-Label Recovery on Fashion Data Using Image Similarity from Triple Siamese Network , 2021 .

[21]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Wei Wu,et al.  High Performance Visual Tracking with Siamese Region Proposal Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Xinlei Chen,et al.  Exploring Simple Siamese Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[25]  Aly A. Farag,et al.  CSIFT: A SIFT Descriptor with Color Invariant Characteristics , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[26]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[27]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[28]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Michal Valko,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[30]  Yi Wang,et al.  Semantic Matching Based on Semantic Segmentation and Neighborhood Consensus , 2021 .

[31]  Pierre Vandergheynst,et al.  FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Ziwen Wang,et al.  GetNet: Get Target Area for Image Pairing , 2019, 2019 International Conference on Image and Vision Computing New Zealand (IVCNZ).

[33]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[34]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.