Low-Shot Visual Recognition by Shrinking and Hallucinating Features

Low-shot visual learning–the ability to recognize novel object categories from very few examples–is a hallmark of human visual intelligence. Existing machine learning approaches fail to generalize in the same way. To make progress on this foundational problem, we present a low-shot learning benchmark on complex images that mimics challenges faced by recognition systems in the wild. We then propose (1) representation regularization techniques, and (2) techniques to hallucinate additional training examples for data-starved classes. Together, our methods improve the effectiveness of convolutional networks in low-shot learning, improving the one-shot accuracy on novel classes by 2.3× on the challenging ImageNet dataset.

[1]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[2]  Joshua B. Tenenbaum,et al.  One-Shot Learning with a Hierarchical Nonparametric Bayesian Model , 2011, ICML Unsupervised and Transfer Learning.

[3]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[4]  Joshua B. Tenenbaum,et al.  One-shot learning by inverting a compositional causal process , 2013, NIPS.

[5]  Trevor Darrell,et al.  Latent Task Adaptation with Large-Scale Hierarchies , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Luca Bertinetto,et al.  Learning feed-forward one-shot learners , 2016, NIPS.

[9]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[10]  Samy Bengio,et al.  Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.

[11]  Atsuto Maki,et al.  From generic to specific deep representations for visual recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[12]  Andrew Zisserman,et al.  Incremental learning of object detectors using a visual shape alphabet , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Sanja Fidler,et al.  Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[15]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Ross B. Girshick,et al.  Reducing Overfitting in Deep Networks by Decorrelating Representations , 2015, ICLR.

[17]  Lauren A. Schmidt Meaning and compositionality as statistical induction of categories and constraints , 2009 .

[18]  Ali Farhadi,et al.  Attribute-centric recognition for cross-category generalization , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Philip H. S. Torr,et al.  Prototypical Priors: From Improving Classification to Zero-Shot Learning , 2015, BMVC.

[20]  Michael Fink,et al.  Object Classification from a Single Example Utilizing Class Relevance Metrics , 2004, NIPS.

[21]  Alan L. Yuille,et al.  One Shot Learning via Compositions of Meaningful Patches , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Andrew Y. Ng,et al.  Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[23]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[25]  Venkatesh Saligrama,et al.  Zero-Shot Learning via Semantic Similarity Embedding , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[27]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[28]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[29]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Wei-Lun Chao,et al.  An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild , 2016, ECCV.

[31]  Shimon Ullman,et al.  Cross-generalization: learning novel classes from a single example by feature replacement , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[32]  Nuno Vasconcelos,et al.  AGA: Attribute-Guided Augmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Feature Hierarchies , 2009 .

[34]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[37]  Jitendra Malik,et al.  Analyzing the Performance of Multilayer Neural Networks for Object Recognition , 2014, ECCV.

[38]  Bruno A. Olshausen,et al.  Discovering Hidden Factors of Variation in Deep Networks , 2014, ICLR.

[39]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[40]  Ming Yang,et al.  Web-scale training for face identification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[42]  Martial Hebert,et al.  Learning to Learn: Model Regression Networks for Easy Small Sample Learning , 2016, ECCV.

[43]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[44]  Lior Wolf,et al.  A theoretical framework for deep transfer learning , 2016 .

[45]  Babak Saleh,et al.  Write a Classifier: Zero-Shot Learning Using Purely Textual Descriptions , 2013, 2013 IEEE International Conference on Computer Vision.

[46]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[47]  Manohar Paluri,et al.  Metric Learning with Adaptive Density Discrimination , 2015, ICLR.

[48]  Lior Wolf,et al.  The Multiverse Loss for Robust Transfer Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Paul A. Viola,et al.  Learning from one example through shared densities on transforms , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[50]  Philip H. S. Torr,et al.  An embarrassingly simple approach to zero-shot learning , 2015, ICML.

[51]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).