论文信息 - Delta-encoder: an effective sample synthesis method for few-shot object recognition

Delta-encoder: an effective sample synthesis method for few-shot object recognition

Learning to classify new categories based on just one or a few examples is a long-standing challenge in modern computer vision. In this work, we proposes a simple yet effective method for few-shot (and one-shot) object recognition. Our approach is based on a modified auto-encoder, denoted Delta-encoder, that learns to synthesize new samples for an unseen category just by seeing few examples from it. The synthesized samples are then used to train a classifier. The proposed approach learns to both extract transferable intra-class deformations, or "deltas", between same-class pairs of training examples, and to apply those deltas to the few provided examples of a novel class (unseen during training) in order to efficiently synthesize samples from that new class. The proposed method improves over the state-of-the-art in one-shot object-recognition and compares favorably in the few-shot case. Upon acceptance code will be made available.

[1] Ambedkar Dukkipati,et al. Generative Adversarial Residual Pairwise Networks for One Shot Learning , 2017, ArXiv.

[2] Bharath Hariharan,et al. Low-Shot Visual Recognition by Shrinking and Hallucinating Features , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[3] Frédéric Jurie,et al. Generating Visual Representations for Zero-Shot Classification , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[4] Pietro Perona,et al. Caltech-UCSD Birds 200 , 2010 .

[5] Christoph H. Lampert,et al. Zero-Shot Learning—A Comprehensive Evaluation of the Good, the Bad and the Ugly , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[7] 拓海杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[8] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10] Yinda Zhang,et al. Semantic Feature Augmentation in Few-shot Learning , 2018, ArXiv.

[11] Kilian Q. Weinberger,et al. Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[12] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.

[13] Antonio Torralba,et al. Transfer Learning by Borrowing Examples for Multiclass Object Detection , 2011, NIPS.

[14] Thomas Paine,et al. Few-shot Autoregressive Density Estimation: Towards Learning to Learn Distributions , 2017, ICLR.

[15] Krista A. Ehinger,et al. SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16] Yanwei Fu,et al. Semi-supervised Vocabulary-Informed Learning , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Ali Farhadi,et al. Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Amos J. Storkey,et al. Data Augmentation Generative Adversarial Networks , 2017, ICLR 2018.

[19] Hong Yu,et al. Meta Networks , 2017, ICML.

[20] Manohar Paluri,et al. Metric Learning with Adaptive Density Discrimination , 2015, ICLR.

[21] Leonidas J. Guibas,et al. Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[23] Thomas Brox,et al. Learning to Generate Chairs, Tables and Cars with Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[25] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[26] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[27] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] XiangTao,et al. Transductive Multi-View Zero-Shot Learning , 2015 .

[29] Martial Hebert,et al. Low-Shot Learning from Imaginary Data , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30] Bin Wu,et al. Deep Meta-Learning: Learning to Learn in the Concept Space , 2018, ArXiv.

[31] Alexei A. Efros,et al. Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[32] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[33] Raymond Y. K. Lau,et al. Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[34] Martial Hebert,et al. Learning to Learn: Model Regression Networks for Easy Small Sample Learning , 2016, ECCV.

[35] Joshua B. Tenenbaum,et al. Human-level concept learning through probabilistic program induction , 2015, Science.

[36] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[37] Percy Liang,et al. Generating Sentences by Editing Prototypes , 2017, TACL.

[38] Derek Hoiem,et al. Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39] Tao Xiang,et al. Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40] Joan Bruna,et al. Few-Shot Learning with Graph Neural Networks , 2017, ICLR.

[41] Vijay Vasudevan,et al. Learning Transferable Architectures for Scalable Image Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42] Kristen Grauman,et al. Semantic Jitter: Dense Supervision for Visual Comparisons via Synthetic Images , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[43] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Deva Ramanan,et al. Articulated pose estimation with tiny synthetic videos , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[45] Martial Hebert,et al. Learning from Small Sample Sets by Combining Unsupervised Meta-Training with CNNs , 2016, NIPS.

[46] Sridhar Mahadevan,et al. Generative Multi-Adversarial Networks , 2016, ICLR.

[47] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Artëm Yankov,et al. Few-Shot Learning with Metric-Agnostic Conditional Embeddings , 2018, ArXiv.

[49] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[50] G. Griffin,et al. Caltech-256 Object Category Dataset , 2007 .

[51] Hang Li,et al. Meta-SGD: Learning to Learn Quickly for Few Shot Learning , 2017, ArXiv.

[52] Yi Yang,et al. Few-shot Object Detection , 2017, ArXiv.

[53] Deyu Meng,et al. Few-Example Object Detection with Model Communication , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54] Sanja Fidler,et al. Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).