论文信息 - One-Shot Image Recognition Using Prototypical Encoders with Reduced Hubness

One-Shot Image Recognition Using Prototypical Encoders with Reduced Hubness

Humans have the innate ability to recognize new objects just by looking at sketches of them (also referred as to proto-type images). Similarly, prototypical images can be used as an effective visual representations of unseen classes to tackle few-shot learning (FSL) tasks. Our main goal is to recognize unseen hand signs (gestures) traffic-signs, and corporate-logos, by having their iconographic images or prototypes. Previous works proposed to utilize variational prototypical-encoders (VPE) to address FSL problems. While VPE learns an image-to-image translation task efficiently, we discovered that its performance is significantly hampered by the so-called hubness problem and it fails to regulate the representations in the latent space. Hence, we propose a new model (VPE++) that inherently reduces hubness and incorporates contrastive and multi-task losses to increase the discriminative ability of FSL models. Results show that the VPE++ approach can generalize better to the unseen classes and can achieve superior accuracies on logos, traffic signs, and hand gestures datasets as compared to the state-of-the-art.

Mohammad Norouzi | Chenxi Xiao | Mohammad Norouzi | Chenxi Xiao

[1] Junnan Li,et al. Prototypical Contrastive Learning of Unsupervised Representations , 2020, ICLR.

[2] Li Fei-Fei,et al. Neural Graph Matching Networks for Fewshot 3D Action Recognition , 2018, ECCV.

[3] J. Deloache. Becoming symbol-minded , 2004, Trends in Cognitive Sciences.

[4] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Edouard Grave,et al. Unsupervised Alignment of Embeddings with Wasserstein Procrustes , 2018, AISTATS.

[6] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[7] Trevor Darrell,et al. Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Yan Wang,et al. SimpleShot: Revisiting Nearest-Neighbor Classification for Few-Shot Learning , 2019, ArXiv.

[9] Lei Zhang,et al. Large Margin Few-Shot Learning , 2018, ArXiv.

[10] Zhuowen Tu,et al. Deeply-Supervised Nets , 2014, AISTATS.

[11] Phillip Isola,et al. Contrastive Multiview Coding , 2019, ECCV.

[12] Li Fei-Fei. Knowledge transfer in learning to recognize visual objects classes , 2006 .

[13] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[15] Olivier Buisson,et al. Logo retrieval with a contrario visual query expansion , 2009, ACM Multimedia.

[16] Duo Li,et al. Dynamic Hierarchical Mimicking Towards Consistent Optimization Objectives , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Tao Xiang,et al. Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18] Baoli Li,et al. Traffic-Sign Detection and Classification in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Dunja Mladenic,et al. A probabilistic approach to nearest-neighbor classification: naive hubness bayesian kNN , 2011, CIKM '11.

[20] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.

[21] Pietro Perona,et al. One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22] Guillaume Lample,et al. Word Translation Without Parallel Data , 2017, ICLR.

[23] Johannes Stallkamp,et al. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition , 2012, Neural Networks.

[24] Kai Li,et al. Efficient k-nearest neighbor graph construction for generic similarity measures , 2011, WWW.

[25] Arthur Flexer,et al. A comprehensive empirical comparison of hubness reduction in high-dimensional spaces , 2018, Knowledge and Information Systems.

[26] Philip H. S. Torr,et al. Prototypical Priors: From Improving Classification to Zero-Shot Learning , 2015, BMVC.

[27] Piyush Rai,et al. A Generative Approach to Zero-Shot and Few-Shot Action Recognition , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[28] Lei Zhang,et al. One-shot Face Recognition by Promoting Underrepresented Classes , 2017, ArXiv.

[29] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[30] Tao Xiang,et al. Learning a Deep Embedding Model for Zero-Shot Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Samuel L. Smith,et al. Offline bilingual word vectors, orthogonal transformations and the inverted softmax , 2017, ICLR.

[32] Sarah Florence Taub,et al. Language from the Body: Iconicity and Metaphor in American Sign Language , 2001 .

[33] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[34] Pietro Perona,et al. A Bayesian approach to unsupervised one-shot learning of object categories , 2003, ICCV 2003.

[35] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[36] Rainer Lienhart,et al. Scalable logo recognition in real-world images , 2011, ICMR.

[37] Brian Hutchinson,et al. Metric-Based Few-Shot Learning for Video Action Recognition , 2019, ArXiv.

[38] Joshua Achiam,et al. On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[39] Alexandros Nanopoulos,et al. Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data , 2010, J. Mach. Learn. Res..

[40] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[41] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[42] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[43] Tae-Hyun Oh,et al. Variational Prototyping-Encoder: One-Shot Learning With Prototypical Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Lars Petersson,et al. Mitigating the Hubness Problem for Zero-Shot Learning of 3D Objects , 2019, BMVC.

[45] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.

[46] Arthur Flexer,et al. scikit-hubness: Hubness Reduction and Approximate Neighbor Search , 2020, J. Open Source Softw..

[47] R. Battison,et al. Lexical Borrowing in American Sign Language , 1978 .

[48] Xuemin Lin,et al. Approximate Nearest Neighbor Search on High Dimensional Data — Experiments, Analyses, and Improvement , 2016, IEEE Transactions on Knowledge and Data Engineering.

[49] Dunja Mladenic,et al. Hubness-aware shared neighbor distances for high-dimensional $$k$$-nearest neighbor classification , 2014, Knowledge and Information Systems.

[50] Georgiana Dinu,et al. Improving zero-shot learning by mitigating the hubness problem , 2014, ICLR.

[51] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52] Tae-Hyun Oh,et al. Co-domain Embedding using Deep Quadruplet Networks for Unseen Traffic Sign Recognition , 2017, AAAI.

[53] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[54] James T. Kwok,et al. Generalizing from a Few Examples , 2019, ACM Comput. Surv..

[55] Bingbing Ni,et al. Variational Few-Shot Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[56] Gregory R. Koch,et al. Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[57] Shaogang Gong,et al. Deep Learning Logo Detection with Data Expansion by Synthesising Context , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[58] G Johansson,et al. Drivers and road signs. , 1970, Ergonomics.

[59] Hervé Jégou,et al. Loss in Translation: Learning Bilingual Word Mapping with a Retrieval Criterion , 2018, EMNLP.

[60] Alexandre Lacoste,et al. TADAM: Task dependent adaptive metric for improved few-shot learning , 2018, NeurIPS.