Multi-Level Semantic Feature Augmentation for One-Shot Learning

The ability to quickly recognize and learn new visual concepts from limited samples enable humans to quickly adapt to new tasks and environments. This ability is enabled by the semantic association of novel concepts with those that have already been learned and stored in memory. Computers can start to ascertain similar abilities by utilizing a semantic concept space. A concept space is a high-dimensional semantic space in which similar abstract concepts appear close and dissimilar ones far apart. In this paper, we propose a novel approach to one-shot learning that builds on this core idea. Our approach learns to map a novel sample instance to a concept, relates that concept to the existing ones in the concept space and, using these relationships, generates new instances, by interpolating among the concepts, to help learning. Instead of synthesizing new image instance, we propose to directly synthesize instance features by leveraging semantics using a novel auto-encoder network called dual TriNet. The encoder part of the TriNet learns to map multi-layer visual features from CNN to a semantic vector. In semantic space, we search for related concepts, which are then projected back into the image feature spaces by the decoder portion of the TriNet. Two strategies in the semantic space are explored. Notably, this seemingly simple strategy results in complex augmented feature distributions in the image feature space, leading to substantially better performance.

[1]  Barbara Caputo,et al.  The More You Know, the Less You Learn: From Knowledge Transfer to One-shot Learning of Object Categories , 2009, BMVC.

[2]  Martial Hebert,et al.  Low-Shot Learning from Imaginary Data , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[4]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Joshua Achiam,et al.  On First-Order Meta-Learning Algorithms , 2018, ArXiv.

[6]  Shimon Ullman,et al.  Uncovering shared structures in multiclass classification , 2007, ICML '07.

[7]  Joan Bruna,et al.  Few-Shot Learning with Graph Neural Networks , 2017, ICLR.

[8]  Kate Saenko,et al.  Learning Deep Object Detectors from 3D Models , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Bharath Hariharan,et al.  Low-Shot Visual Recognition by Shrinking and Hallucinating Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Cordelia Schmid,et al.  MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild , 2016, NIPS.

[11]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[12]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[15]  Zi Huang,et al.  Multi-attention Network for One Shot Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[17]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Antonio Torralba,et al.  Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Daan Wierstra,et al.  One-shot Learning with Memory-Augmented Neural Networks , 2016, ArXiv.

[20]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[21]  Hong Yu,et al.  Meta Networks , 2017, ICML.

[22]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[23]  Nikos Komodakis,et al.  Dynamic Few-Shot Visual Learning Without Forgetting , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Antonio Torralba,et al.  Using the forest to see the trees: exploiting context for visual object detection and localization , 2010, CACM.

[25]  Wenjun Zeng,et al.  Deeply-Fused Nets , 2016, ArXiv.

[26]  Charless C. Fowlkes,et al.  Do We Need More Training Data? , 2015, International Journal of Computer Vision.

[27]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[28]  Andrew Zisserman,et al.  Incremental learning of object detectors using a visual shape alphabet , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29]  Sanja Fidler,et al.  Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[31]  Wei Shen,et al.  Few-Shot Image Recognition by Predicting Parameters from Activations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Michael Fink,et al.  Object Classification from a Single Example Utilizing Class Relevance Metrics , 2004, NIPS.

[33]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[34]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[35]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[36]  Joshua B. Tenenbaum,et al.  One-shot learning by inverting a compositional causal process , 2013, NIPS.

[37]  Luc Van Gool,et al.  One-Shot Video Object Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Daphna Weinshall,et al.  Learning a kernel function for classification with small training samples , 2006, ICML.

[39]  Deva Ramanan,et al.  Articulated pose estimation with tiny synthetic videos , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[40]  Yair Movshovitz-Attias,et al.  Ontological supervision for fine grained classification of Street View storefronts , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Thomas Brox,et al.  Learning to generate chairs with convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Rogério Schmidt Feris,et al.  Delta-encoder: an effective sample synthesis method for few-shot object recognition , 2018, NeurIPS.

[43]  XiangTao,et al.  Transductive Multi-View Zero-Shot Learning , 2015 .

[44]  Nuno Vasconcelos,et al.  Feature Space Transfer for Data Augmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[46]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[48]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[49]  Trevor Darrell,et al.  Transfer learning for image classification with sparse prototype representations , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Bin Wu,et al.  Deep Meta-Learning: Learning to Learn in the Concept Space , 2018, ArXiv.

[51]  Yi Yang,et al.  Transductive Propagation Network for Few-shot Learning , 2018, ArXiv.

[52]  Alexei A. Efros,et al.  Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[53]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[54]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Leonidas J. Guibas,et al.  Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[56]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Artëm Yankov,et al.  Few-Shot Learning with Metric-Agnostic Conditional Embeddings , 2018, ArXiv.

[58]  Sebastian Thrun,et al.  Learning To Learn: Introduction , 1996 .

[59]  Trevor Darrell,et al.  Deep Layer Aggregation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[60]  Tao Mei,et al.  Memory Matching Networks for One-Shot Image Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[61]  Bernt Schiele,et al.  Transfer Learning in a Transductive Setting , 2013, NIPS.

[62]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[63]  Lior Wolf,et al.  Robust boosting for learning from few examples , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[64]  Luca Bertinetto,et al.  Meta-learning with differentiable closed-form solvers , 2018, ICLR.

[65]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[66]  Shimon Ullman,et al.  Cross-generalization: learning novel classes from a single example by feature replacement , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[67]  Nuno Vasconcelos,et al.  AGA: Attribute-Guided Augmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Hang Li,et al.  Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[69]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[70]  Kumar Chellapilla,et al.  Personalized handwriting recognition via biased regularization , 2006, ICML.

[71]  Byron Boots,et al.  One-Shot Learning for Semantic Segmentation , 2017, BMVC.

[72]  Yanwei Fu,et al.  Semi-supervised Vocabulary-Informed Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Yair Movshovitz-Attias,et al.  Dataset Curation through Renders and Ontology Matching , 2015 .

[74]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[75]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[76]  Antonio Torralba,et al.  Transfer Learning by Borrowing Examples for Multiclass Object Detection , 2011, NIPS.

[77]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[78]  Gilles Blanchard,et al.  Pattern Recognition from One Example by Chopping , 2005, NIPS.

[79]  Tal Hassner,et al.  The One-Shot similarity kernel , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[80]  Yi Yang,et al.  Few-Shot Object Recognition from Machine-Labeled Web Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[81]  Anil A. Bharath,et al.  A data augmentation methodology for training machine/deep learning gait recognition algorithms , 2016, BMVC.

[82]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[83]  Bernt Schiele,et al.  What helps where – and why? Semantic relatedness for knowledge transfer , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[84]  Daan Wierstra,et al.  One-Shot Generalization in Deep Generative Models , 2016, ICML.

[85]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[86]  John E. Hopcroft,et al.  Stacked Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[87]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[88]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[89]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[90]  Martial Hebert,et al.  Learning to Learn: Model Regression Networks for Easy Small Sample Learning , 2016, ECCV.

[91]  Martial Hebert,et al.  Learning from Small Sample Sets by Combining Unsupervised Meta-Training with CNNs , 2016, NIPS.

[92]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[93]  Sridhar Mahadevan,et al.  Generative Multi-Adversarial Networks , 2016, ICLR.