论文信息 - Learning Finer-class Networks for Universal Representations

Learning Finer-class Networks for Universal Representations

Many real-world visual recognition use-cases can not directly benefit from state-of-the-art CNN-based approaches because of the lack of many annotated data. The usual approach to deal with this is to transfer a representation pre-learned on a large annotated source-task onto a target-task of interest. This raises the question of how well the original representation is "universal", that is to say directly adapted to many different target-tasks. To improve such universality, the state-of-the-art consists in training networks on a diversified source problem, that is modified either by adding generic or specific categories to the initial set of categories. In this vein, we proposed a method that exploits finer-classes than the most specific ones existing, for which no annotation is available. We rely on unsupervised learning and a bottom-up split and merge strategy. We show that our method learns more universal representations than state-of-the-art, leading to significantly better results on 10 target-tasks from multiple domains, using several network architectures, either alone or combined with networks learned at a coarser semantic level.

[1] Nan Hua,et al. Universal Sentence Encoder , 2018, ArXiv.

[2] Leonidas J. Guibas,et al. Human action recognition by learning bases of action attributes and parts , 2011, 2011 International Conference on Computer Vision.

[3] Céline Hudelot,et al. Diverse Concept-Level Features for Multi-Object Classification , 2016, ICMR.

[4] Jun Li,et al. Deep Convolutional Neural Network with Independent Softmax for Large Scale Face Recognition , 2016, ACM Multimedia.

[5] Christopher Joseph Pal,et al. Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning , 2018, ICLR.

[6] Andrea Vedaldi,et al. Learning multiple visual domains with residual adapters , 2017, NIPS.

[7] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[8] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[9] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Jian Dong,et al. Subcategory-Aware Object Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11] Tao Chen,et al. S-CNN: Subcategory-Aware Convolutional Networks for Object Detection , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Matthieu Guillaumin,et al. Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.

[13] Jianguo Zhang,et al. The PASCAL Visual Object Classes Challenge , 2006 .

[14] Atsuto Maki,et al. Factors of Transferability for a Generic ConvNet Representation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15] Céline Hudelot,et al. MuCaLe-Net: Multi Categorical-Level Networks to Generate More Discriminating Features , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[17] Martial Hebert,et al. Growing a Brain: Fine-Tuning by Increasing Model Capacity , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] G. Griffin,et al. Caltech-256 Object Category Dataset , 2007 .

[19] Iasonas Kokkinos,et al. UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Antonio Torralba,et al. Recognizing indoor scenes , 2009, CVPR.

[21] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Allan Jabri,et al. Learning Visual Features from Large Weakly Supervised Data , 2015, ECCV.

[23] Jian Dong,et al. Looking Inside Category: Subcategory-Aware Object Recognition , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[24] Subhransu Maji,et al. Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[25] Ivan Laptev,et al. Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Céline Hudelot,et al. Supplementary Material for : Learning More Universal Representations for Transfer-Learning , 2018 .

[27] Andrea Vedaldi,et al. Universal representations: The missing link between faces, text, planktons, and cat breeds , 2017, ArXiv.

[28] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29] Adrian Popescu,et al. Harnessing noisy Web images for deep representation , 2017, Comput. Vis. Image Underst..

[30] Pietro Perona,et al. One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31] Adrian Popescu,et al. Vision-language integration using constrained local semantic features , 2017, Comput. Vis. Image Underst..

[32] Andrew Zisserman,et al. Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[33] Douwe Kiela,et al. SentEval: An Evaluation Toolkit for Universal Sentence Representations , 2018, LREC.

[34] Lei Wang,et al. In defense of soft-assignment coding , 2011, 2011 International Conference on Computer Vision.

[35] Céline Hudelot,et al. Learning More Universal Representations for Transfer-Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36] Holger Schwenk,et al. Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[37] Hervé Le Borgne,et al. AMECON: Abstract Meta-Concept Features for Text-Illustration , 2017, ICMR.

[38] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[39] Tat-Seng Chua,et al. NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[40] Silvio Savarese,et al. Subcategory-Aware Convolutional Neural Networks for Object Proposals and Detection , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[41] Dennis Koelma,et al. The ImageNet Shuffle: Reorganized Pre-training for Video Event Detection , 2016, ICMR.

[42] Adrian Popescu,et al. Scalable domain adaptation of convolutional neural networks , 2015, ArXiv.

[43] Andrea Vedaldi,et al. Efficient Parametrization of Multi-domain Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44] Deva Ramanan,et al. The More You Look, the More You See: Towards General Object Understanding Through Recursive Refinement , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[45] Alexei A. Efros,et al. What makes ImageNet good for transfer learning? , 2016, ArXiv.