Disentangling Nonlinear Perceptual Embeddings With Multi-Query Triplet Networks

In typical perceptual tasks, higher-order concepts are inferred from visual features to assist with perceptual decision making. However, there is a multitude of visual concepts which can be inferred from a single stimulus. When learning nonlinear embeddings with siamese or triplet networks from similarities, we typically assume they are sourced from a single visual concept. In this paper, we are concerned with the hypothesis that it can be potentially harmful to ignore the heterogeneity of concepts affiliated with observed similarities when learning these embedding networks. We demonstrate empirically that this hypothesis holds and suggest an approach that deals with these shortcomings, by combining multiple notions of similarities in one compact system. We propose Multi-Query Networks (MQNs) that leverage recent advances in representation learning on factorized triplet embeddings in combination with Convolutional Networks in order to learn embeddings differentiated into semantically distinct subspaces, which are learned with a latent space attention mechanism. We show that the resulting model learns visually relevant semantic subspaces with features that do not only outperform single triplet networks, but even sets of concept specific networks.

[1]  Geoffrey E. Hinton,et al.  Visualizing non-metric similarities in multiple maps , 2011, Machine Learning.

[2]  Ehsan Amid,et al.  Multiview Triplet Embedding: Learning Attributes in Multiple Maps , 2015, ICML.

[3]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[4]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[6]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[7]  Yang Song,et al.  Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Jeff A. Bilmes,et al.  On Deep Multi-View Representation Learning , 2015, ICML.

[9]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[10]  Gert R. G. Lanckriet,et al.  Learning Multi-modal Similarity , 2010, J. Mach. Learn. Res..

[11]  Serge J. Belongie,et al.  Bayesian representation learning with oracle constraints , 2015, ICLR 2016.

[12]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Kilian Q. Weinberger,et al.  Stochastic triplet embedding , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.

[14]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[17]  Samy Bengio,et al.  Large Scale Online Learning of Image Similarity through Ranking , 2009, IbPRIA.

[18]  David J. Kriegman,et al.  Learning Concept Embeddings with Combined Human-Machine Expertise , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[20]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[21]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Steve Branson,et al.  Similarity metrics for categorization: From monolithic to category specific , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[23]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[24]  Yuting Zhang,et al.  Learning to Disentangle Factors of Variation with Manifold Interaction , 2014, ICML.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Kristen Grauman,et al.  Fine-Grained Visual Comparisons with Local Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Adam Tauman Kalai,et al.  Adaptively Learning the Crowd Kernel , 2011, ICML.

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).