Semantic Feature Augmentation in Few-shot Learning

A fundamental problem with few-shot learning is the scarcity of data in training. A natural solution to alleviate this scarcity is to augment the existing images for each training class. However, directly augmenting samples in image space may not necessarily, nor sufficiently, explore the intra-class variation. To this end, we propose to directly synthesize instance features by leveraging the semantics of each class. Essentially, a novel auto-encoder network dual TriNet, is proposed for feature augmentation. The encoder TriNet projects multi-layer visual features of deep CNNs into the semantic space. In this space, data augmentation is induced, and the augmented instance representation is projected back into the image feature spaces by the decoder TriNet. Two data argumentation strategies in the semantic space are explored; notably these seemingly simple augmentations in semantic space result in complex augmented feature distributions in the image feature space, resulting in substantially better performance. The code and models of our paper will be published on: this https URL

[1]  Antonio Torralba,et al.  Transfer Learning by Borrowing Examples for Multiclass Object Detection , 2011, NIPS.

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Sebastian Thrun,et al.  Learning To Learn: Introduction , 1996 .

[4]  Artëm Yankov,et al.  Few-Shot Learning with Metric-Agnostic Conditional Embeddings , 2018, ArXiv.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Bernt Schiele,et al.  Transfer Learning in a Transductive Setting , 2013, NIPS.

[8]  Hang Li,et al.  Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[9]  Alexei A. Efros,et al.  Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[10]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Lior Wolf,et al.  Robust boosting for learning from few examples , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12]  Martial Hebert,et al.  Learning from Small Sample Sets by Combining Unsupervised Meta-Training with CNNs , 2016, NIPS.

[13]  John E. Hopcroft,et al.  Stacked Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[15]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[16]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  Kumar Chellapilla,et al.  Personalized handwriting recognition via biased regularization , 2006, ICML.

[18]  Byron Boots,et al.  One-Shot Learning for Semantic Segmentation , 2017, BMVC.

[19]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Sridhar Mahadevan,et al.  Generative Multi-Adversarial Networks , 2016, ICLR.

[21]  Antonio Torralba,et al.  Using the forest to see the trees: exploiting context for visual object detection and localization , 2010, CACM.

[22]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[23]  Andrea Vedaldi,et al.  Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Charless C. Fowlkes,et al.  Do We Need More Training Data? , 2015, International Journal of Computer Vision.

[25]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[26]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[27]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[28]  Michael Fink,et al.  Object Classification from a Single Example Utilizing Class Relevance Metrics , 2004, NIPS.

[29]  Barbara Caputo,et al.  The More You Know, the Less You Learn: From Knowledge Transfer to One-shot Learning of Object Categories , 2009, BMVC.

[30]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[31]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[32]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[33]  Hong Yu,et al.  Meta Networks , 2017, ICML.

[34]  XiangTao,et al.  Transductive Multi-View Zero-Shot Learning , 2015 .

[35]  Yair Movshovitz-Attias,et al.  Ontological supervision for fine grained classification of Street View storefronts , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Trevor Darrell,et al.  Transfer learning for image classification with sparse prototype representations , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[39]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[40]  Wenjun Zeng,et al.  Deeply-Fused Nets , 2016, ArXiv.

[41]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[42]  Joshua B. Tenenbaum,et al.  One-shot learning by inverting a compositional causal process , 2013, NIPS.

[43]  Martial Hebert,et al.  Learning to Learn: Model Regression Networks for Easy Small Sample Learning , 2016, ECCV.

[44]  Shimon Ullman,et al.  Uncovering shared structures in multiclass classification , 2007, ICML '07.

[45]  Shimon Ullman,et al.  Cross-generalization: learning novel classes from a single example by feature replacement , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[46]  Nuno Vasconcelos,et al.  AGA: Attribute-Guided Augmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Yanwei Fu,et al.  Semi-supervised Vocabulary-Informed Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Yair Movshovitz-Attias,et al.  Dataset Curation through Renders and Ontology Matching , 2015 .

[49]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[50]  Trevor Darrell,et al.  Deep Layer Aggregation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Luc Van Gool,et al.  One-Shot Video Object Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Daphna Weinshall,et al.  Learning a kernel function for classification with small training samples , 2006, ICML.

[53]  Deva Ramanan,et al.  Articulated pose estimation with tiny synthetic videos , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[54]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[55]  Kate Saenko,et al.  Learning Deep Object Detectors from 3D Models , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[56]  Daan Wierstra,et al.  One-shot Learning with Memory-Augmented Neural Networks , 2016, ArXiv.

[57]  Norbert Jankowski,et al.  Meta-Learning in Computational Intelligence , 2013, Meta-Learning in Computational Intelligence.

[58]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[59]  Antonio Torralba,et al.  Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[61]  Thomas Brox,et al.  Learning to generate chairs with convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Bharath Hariharan,et al.  Low-Shot Visual Recognition by Shrinking and Hallucinating Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[63]  Cordelia Schmid,et al.  MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild , 2016, NIPS.

[64]  Gilles Blanchard,et al.  Pattern Recognition from One Example by Chopping , 2005, NIPS.

[65]  Tal Hassner,et al.  The One-Shot similarity kernel , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[66]  Yi Yang,et al.  Few-Shot Object Recognition from Machine-Labeled Web Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Anil A. Bharath,et al.  A data augmentation methodology for training machine/deep learning gait recognition algorithms , 2016, BMVC.

[68]  Leonidas J. Guibas,et al.  Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[69]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[70]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[71]  Bernt Schiele,et al.  What helps where – and why? Semantic relatedness for knowledge transfer , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[72]  Daan Wierstra,et al.  One-Shot Generalization in Deep Generative Models , 2016, ICML.

[73]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[74]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[75]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[76]  Andrew Zisserman,et al.  Incremental learning of object detectors using a visual shape alphabet , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[77]  Sanja Fidler,et al.  Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[78]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[79]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[80]  Bin Wu,et al.  Deep Meta-Learning: Learning to Learn in the Concept Space , 2018, ArXiv.

[81]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .