论文信息 - Siamese Networks: The Tale of Two Manifolds

Siamese Networks: The Tale of Two Manifolds

Siamese networks are non-linear deep models that have found their ways into a broad set of problems in learning theory, thanks to their embedding capabilities. In this paper, we study Siamese networks from a new perspective and question the validity of their training procedure. We show that in the majority of cases, the objective of a Siamese network is endowed with an invariance property. Neglecting the invariance property leads to a hindrance in training the Siamese networks. To alleviate this issue, we propose two Riemannian structures and generalize a well-established accelerated stochastic gradient descent method to take into account the proposed Riemannian structures. Our empirical evaluations suggest that by making use of the Riemannian geometry, we achieve state-of-the-art results against several algorithms for the challenging problem of fine-grained image classification.

[1] Silvere Bonnabel,et al. Regression on Fixed-Rank Positive Semidefinite Matrices: A Riemannian Approach , 2010, J. Mach. Learn. Res..

[2] Xiaogang Wang,et al. Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[3] Matthieu Cord,et al. Learning a Distance Metric from Relative Comparisons between Quadruplets of Images , 2016, International Journal of Computer Vision.

[4] Kihyuk Sohn,et al. Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[5] Xudong Lin,et al. Deep Variational Metric Learning , 2018, ECCV.

[6] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7] Alexander J. Smola,et al. Sampling Matters in Deep Embedding Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8] Gustavo Carneiro,et al. Learning Local Image Descriptors with Deep Siamese and Triplet Convolutional Networks by Minimizing Global Loss Functions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Luca Bertinetto,et al. Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[10] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12] Jonathan Tompson,et al. Efficient object localization using Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Yann LeCun,et al. Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[14] Robert E. Mahony,et al. Optimization Algorithms on Matrix Manifolds , 2007 .

[15] Kilian Q. Weinberger,et al. Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[16] Nir Ailon,et al. Deep Metric Learning Using Triplet Network , 2014, SIMBAD.

[17] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[18] Raquel Urtasun,et al. Few-Shot Learning Through an Information Retrieval Lens , 2017, NIPS.

[19] Jonathan Krause,et al. 3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[20] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Lars Petersson,et al. Bilinear Attention Networks for Person Retrieval , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23] Gustavo Carneiro,et al. Smart Mining for Deep Metric Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24] Jing Zhang,et al. Few-Shot Learning via Saliency-Guided Hallucination of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[26] Kiyoharu Aizawa,et al. Significance of Softmax-Based Features in Comparison to Distance Metric Learning-Based Features , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Wotao Yin,et al. A feasible method for optimization with orthogonality constraints , 2013, Math. Program..

[28] Chen Huang,et al. Local Similarity-Aware Deep Feature Embedding , 2016, NIPS.

[29] Nikos Komodakis,et al. Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Gene H. Golub,et al. Matrix computations , 1983 .

[31] Raquel Urtasun,et al. Deep Spectral Clustering Learning , 2017, ICML.

[32] Jian Cheng,et al. NormFace: L2 Hypersphere Embedding for Face Verification , 2017, ACM Multimedia.

[33] Victor S. Lempitsky,et al. Learning Deep Embeddings with Histogram Loss , 2016, NIPS.

[34] Alan Edelman,et al. The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[35] Raquel Urtasun,et al. Efficient Deep Learning for Stereo Matching , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Jian Wang,et al. Deep Metric Learning with Angular Loss , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37] Bohyung Han,et al. Multi-object Tracking with Quadruplet Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Xiang Yu,et al. Deep Metric Learning via Lifted Structured Feature Embedding , 2016 .

[39] Bhiksha Raj,et al. SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40] Serge J. Belongie,et al. Learning deep representations for ground-to-aerial geolocalization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Xing Ji,et al. CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42] Christoph H. Lampert,et al. Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43] Gang Wang,et al. Gated Siamese Convolutional Neural Network Architecture for Human Re-identification , 2016, ECCV.

[44] Gregory R. Koch,et al. Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[45] Iasonas Kokkinos,et al. Discriminative Learning of Deep Convolutional Feature Point Descriptors , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[46] Albert Gordo,et al. End-to-End Learning of Deep Visual Representations for Image Retrieval , 2016, International Journal of Computer Vision.

[47] Arnold W. M. Smeulders,et al. UvA-DARE (Digital Academic Repository) Siamese Instance Search for Tracking , 2016 .

[48] Stefanie Jegelka,et al. Deep Metric Learning via Facility Location , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49] Silvere Bonnabel,et al. Stochastic Gradient Descent on Riemannian Manifolds , 2011, IEEE Transactions on Automatic Control.

[50] Ming Yang,et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[51] Honglak Lee,et al. An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[52] Yung-Yu Chuang,et al. DeepCD: Learning Deep Complementary Descriptors for Patch Representations , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[53] Francis R. Bach,et al. Low-Rank Optimization on the Cone of Positive Semidefinite Matrices , 2008, SIAM J. Optim..

[54] Luc Van Gool,et al. Building Deep Networks on Grassmann Manifolds , 2016, AAAI.

[55] Xiaogang Wang,et al. Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[56] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[57] Jiwen Lu,et al. Discriminative Deep Metric Learning for Face Verification in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[58] Stefanos Zafeiriou,et al. ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59] Jürgen Schmidhuber,et al. Multimodal Similarity-Preserving Hashing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60] John M. Lee. Introduction to Smooth Manifolds , 2002 .

[61] Rui Zhang,et al. Museum Exhibit Identification Challenge for the Supervised Domain Adaptation and Beyond , 2018, ECCV.

[62] Bamdev Mishra,et al. Fixed-rank matrix factorizations and Riemannian low-rank optimization , 2012, Comput. Stat..

[63] Jiwen Lu,et al. Deep Adversarial Metric Learning , 2020, IEEE Transactions on Image Processing.

[64] Yann LeCun,et al. Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).