Learning Finer-class Networks for Universal Representations

Many real-world visual recognition use-cases can not directly benefit from state-of-the-art CNN-based approaches because of the lack of many annotated data. The usual approach to deal with this is to transfer a representation pre-learned on a large annotated source-task onto a target-task of interest. This raises the question of how well the original representation is "universal", that is to say directly adapted to many different target-tasks. To improve such universality, the state-of-the-art consists in training networks on a diversified source problem, that is modified either by adding generic or specific categories to the initial set of categories. In this vein, we proposed a method that exploits finer-classes than the most specific ones existing, for which no annotation is available. We rely on unsupervised learning and a bottom-up split and merge strategy. We show that our method learns more universal representations than state-of-the-art, leading to significantly better results on 10 target-tasks from multiple domains, using several network architectures, either alone or combined with networks learned at a coarser semantic level.

[1]  Nan Hua,et al.  Universal Sentence Encoder , 2018, ArXiv.

[2]  Leonidas J. Guibas,et al.  Human action recognition by learning bases of action attributes and parts , 2011, 2011 International Conference on Computer Vision.

[3]  Céline Hudelot,et al.  Diverse Concept-Level Features for Multi-Object Classification , 2016, ICMR.

[4]  Jun Li,et al.  Deep Convolutional Neural Network with Independent Softmax for Large Scale Face Recognition , 2016, ACM Multimedia.

[5]  Christopher Joseph Pal,et al.  Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning , 2018, ICLR.

[6]  Andrea Vedaldi,et al.  Learning multiple visual domains with residual adapters , 2017, NIPS.

[7]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[8]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[9]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Jian Dong,et al.  Subcategory-Aware Object Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Tao Chen,et al.  S-CNN: Subcategory-Aware Convolutional Networks for Object Detection , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Matthieu Guillaumin,et al.  Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.

[13]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[14]  Atsuto Maki,et al.  Factors of Transferability for a Generic ConvNet Representation , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Céline Hudelot,et al.  MuCaLe-Net: Multi Categorical-Level Networks to Generate More Discriminating Features , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[17]  Martial Hebert,et al.  Growing a Brain: Fine-Tuning by Increasing Model Capacity , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[19]  Iasonas Kokkinos,et al.  UberNet: Training a Universal Convolutional Neural Network for Low-, Mid-, and High-Level Vision Using Diverse Datasets and Limited Memory , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, CVPR.

[21]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Allan Jabri,et al.  Learning Visual Features from Large Weakly Supervised Data , 2015, ECCV.

[23]  Jian Dong,et al.  Looking Inside Category: Subcategory-Aware Object Recognition , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[24]  Subhransu Maji,et al.  Fine-Grained Visual Classification of Aircraft , 2013, ArXiv.

[25]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Céline Hudelot,et al.  Supplementary Material for : Learning More Universal Representations for Transfer-Learning , 2018 .

[27]  Andrea Vedaldi,et al.  Universal representations: The missing link between faces, text, planktons, and cat breeds , 2017, ArXiv.

[28]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29]  Adrian Popescu,et al.  Harnessing noisy Web images for deep representation , 2017, Comput. Vis. Image Underst..

[30]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Adrian Popescu,et al.  Vision-language integration using constrained local semantic features , 2017, Comput. Vis. Image Underst..

[32]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[33]  Douwe Kiela,et al.  SentEval: An Evaluation Toolkit for Universal Sentence Representations , 2018, LREC.

[34]  Lei Wang,et al.  In defense of soft-assignment coding , 2011, 2011 International Conference on Computer Vision.

[35]  Céline Hudelot,et al.  Learning More Universal Representations for Transfer-Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[37]  Hervé Le Borgne,et al.  AMECON: Abstract Meta-Concept Features for Text-Illustration , 2017, ICMR.

[38]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[39]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[40]  Silvio Savarese,et al.  Subcategory-Aware Convolutional Neural Networks for Object Proposals and Detection , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[41]  Dennis Koelma,et al.  The ImageNet Shuffle: Reorganized Pre-training for Video Event Detection , 2016, ICMR.

[42]  Adrian Popescu,et al.  Scalable domain adaptation of convolutional neural networks , 2015, ArXiv.

[43]  Andrea Vedaldi,et al.  Efficient Parametrization of Multi-domain Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Deva Ramanan,et al.  The More You Look, the More You See: Towards General Object Understanding Through Recursive Refinement , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[45]  Alexei A. Efros,et al.  What makes ImageNet good for transfer learning? , 2016, ArXiv.