论文信息 - CNN: Single-label to Multi-label

CNN: Single-label to Multi-label

Convolutional Neural Network (CNN) has demonstrated promising performance in single-label image classification tasks. However, how CNN best copes with multi-label images still remains an open problem, mainly due to the complex underlying object layouts and insufficient multi-label training images. In this work, we propose a flexible deep CNN infrastructure, called Hypotheses-CNN-Pooling (HCP), where an arbitrary number of object segment hypotheses are taken as the inputs, then a shared CNN is connected with each hypothesis, and finally the CNN output results from different hypotheses are aggregated with max pooling to produce the ultimate multi-label predictions. Some unique characteristics of this flexible deep CNN infrastructure include: 1) no ground-truth bounding box information is required for training; 2) the whole HCP infrastructure is robust to possibly noisy and/or redundant hypotheses; 3) no explicit hypothesis label is required; 4) the shared CNN may be well pre-trained with a large-scale single-label image dataset, e.g. ImageNet; and 5) it may naturally output multi-label prediction results. Experimental results on Pascal VOC2007 and VOC2012 multi-label image datasets well demonstrate the superiority of the proposed HCP infrastructure over other state-of-the-arts. In particular, the mAP reaches 84.2% by HCP only and 90.3% after the fusion with our complementary result in [47] based on hand-crafted features on the VOC2012 dataset, which significantly outperforms the state-of-the-arts with a large margin of more than 7%.

[1] Nasser M. Nasrabadi,et al. Image coding using vector quantization: a review , 1988, IEEE Trans. Commun..

[2] Lawrence D. Jackel,et al. Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[3] Matti Pietikäinen,et al. A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[4] Jan Skoglund,et al. Vector quantization based on Gaussian mixture models , 2000, IEEE Trans. Speech Audio Process..

[5] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[6] Pietro Perona,et al. Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[7] Y. LeCun,et al. Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[8] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10] G. Griffin,et al. Caltech-256 Object Category Dataset , 2007 .

[11] Nanning Zheng,et al. Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12] Honglak Lee,et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[13] Yann LeCun,et al. What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[14] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[15] Cordelia Schmid,et al. Combining efficient object localization and image classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[17] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Yihong Gong,et al. Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19] Thomas Mensink,et al. Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[20] Guillermo Sapiro,et al. Sparse Representation for Computer Vision and Pattern Recognition , 2010, Proceedings of the IEEE.

[21] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[22] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[23] Thomas Deselaers,et al. Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24] Fei-Fei Li,et al. Object-Centric Spatial Pooling for Image Classification , 2012, ECCV.

[25] Cristian Sminchisescu,et al. CPMC: Automatic Object Segmentation Using Constrained Parametric Min-Cuts , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26] Qiang Chen,et al. Hierarchical matching with side information for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27] Li Xu,et al. Hierarchical Saliency Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28] Xiaogang Wang,et al. Joint Deep Learning for Pedestrian Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[29] Vibhav Vineet,et al. Efficient Salient Region Detection with Soft Image Abstraction , 2013, 2013 IEEE International Conference on Computer Vision.

[30] Jian Dong,et al. Subcategory-Aware Object Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31] Yangqing Jia,et al. Deep Convolutional Ranking for Multilabel Image Annotation , 2013, ICLR.

[32] R. Fergus,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[33] Qiang Chen,et al. Network In Network , 2013, ICLR.

[34] Svetlana Lazebnik,et al. Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[35] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[36] Ivan Laptev,et al. Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37] Jian Dong,et al. Contextualizing Object Detection and Classification , 2015, IEEE Trans. Pattern Anal. Mach. Intell..