论文信息 - Few-Shot Learning With Localization in Realistic Settings

Few-Shot Learning With Localization in Realistic Settings

Traditional recognition methods typically require large, artificially-balanced training classes, while few-shot learning methods are tested on artificially small ones. In contrast to both extremes, real world recognition problems exhibit heavy-tailed class distributions, with cluttered scenes and a mix of coarse and fine-grained class distinctions. We show that prior methods designed for few-shot learning do not work out of the box in these challenging conditions, based on a new “meta-iNat” benchmark. We introduce three parameter-free improvements: (a) better training procedures based on adapting cross-validation to meta-learning, (b) novel architectures that localize objects using limited bounding box annotations before classification, and (c) simple parameter-free expansions of the feature space based on bilinear pooling. Together, these improvements double the accuracy of state-of-the-art models on meta-iNat while generalizing to prior benchmarks, complex neural architectures, and settings with substantial domain shift.

Bharath Hariharan | Davis Wertheimer | Bharath Hariharan | Davis Wertheimer

[1] Martial Hebert,et al. Low-Shot Learning from Imaginary Data , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2] Yang Gao,et al. Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Luca Bertinetto,et al. Learning feed-forward one-shot learners , 2016, NIPS.

[4] Xiaogang Wang,et al. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Noah Snavely,et al. Material recognition in the wild with the Materials in Context Database , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Bolei Zhou,et al. Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[8] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[9] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[10] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[11] Thomas Mensink,et al. Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[12] Amos J. Storkey,et al. Towards a Neural Statistician , 2016, ICLR.

[13] Daan Wierstra,et al. Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[14] Ivan Laptev,et al. Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Subhransu Maji,et al. Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[17] Luc Van Gool,et al. Covariance Pooling for Facial Expression Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[18] Michael Fink,et al. Object Classification from a Single Example Utilizing Class Relevance Metrics , 2004, NIPS.

[19] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[20] Martial Hebert,et al. Learning to Model the Tail , 2017, NIPS.

[21] Frank Keller,et al. Extreme Clicking for Efficient Object Annotation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22] Tao Xiang,et al. Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23] Matthieu Guillaumin,et al. Segmentation Propagation in ImageNet , 2012, ECCV.

[24] Joshua B. Tenenbaum,et al. Human-level concept learning through probabilistic program induction , 2015, Science.

[25] Cordelia Schmid,et al. Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[27] Subhransu Maji,et al. One-to-many face recognition with bilinear CNNs , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[28] Yuxin Peng,et al. The application of two-level attention models in deep convolutional neural network for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Yang Song,et al. The iNaturalist Species Classification and Detection Dataset , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30] Martial Hebert,et al. Learning to Learn: Model Regression Networks for Easy Small Sample Learning , 2016, ECCV.

[31] Cristian Sminchisescu,et al. Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[32] Le Song,et al. Deep Fried Convnets , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[33] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.

[34] Pieter Abbeel,et al. A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[35] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[36] Gregory R. Koch,et al. Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[37] Byron Boots,et al. One-Shot Learning for Semantic Segmentation , 2017, BMVC.

[38] Hongguang Zhang,et al. Power Normalizing Second-Order Similarity Network for Few-Shot Learning , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[39] Matthieu Guillaumin,et al. ImageNet Auto-Annotation with Segmentation Propagation , 2014, International Journal of Computer Vision.

[40] Stefano Berretti,et al. Deep Covariance Descriptors for Facial Expression Recognition , 2018, BMVC.

[41] Hong Yu,et al. Meta Networks , 2017, ICML.

[42] Krista A. Ehinger,et al. SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[43] Krystian Mikolajczyk,et al. Higher-Order Occurrence Pooling for Bags-of-Words: Visual Concept Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44] Shih-Fu Chang,et al. An Exploration of Parameter Redundancy in Deep Networks with Circulant Projections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[45] Gabriela Csurka,et al. Visual categorization with bags of keypoints , 2002, eccv 2004.

[46] Xiaogang Wang,et al. Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).