Beyond Object Proposals: Random Crop Pooling for Multi-Label Image Recognition

Learning high-level image representations using object proposals has achieved remarkable success in multi-label image recognition. However, most object proposals provide merely coarse information about the objects, and only carefully selected proposals can be helpful for boosting the performance of multi-label image recognition. In this paper, we propose an object-proposal-free framework for multi-label image recognition: random crop pooling (RCP). Basically, RCP performs stochastic scaling and cropping over images before feeding them to a standard convolutional neural network, which works quite well with a max-pooling operation for recognizing the complex contents of multi-label images. To better fit the multi-label image recognition task, we further develop a new loss function-the dynamic weighted Euclidean loss-for the training of the deep network. Our RCP approach is amazingly simple yet effective. It can achieve significantly better image recognition performance than the approaches using object proposals. Moreover, our adapted network can be easily trained in an end-to-end manner. Extensive experiments are conducted on two representative multi-label image recognition data sets (i.e., PASCAL VOC 2007 and PASCAL VOC 2012), and the results clearly demonstrate the superiority of our approach.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Bingbing Ni,et al.  HCP: A Flexible CNN Framework for Multi-Label Image Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Ivan Laptev,et al.  Is object localization for free? - Weakly-supervised learning with convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Philip H. S. Torr,et al.  BING: Binarized normed gradients for objectness estimation at 300fps , 2014, Computational Visual Media.

[5]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[6]  Jianfei Cai,et al.  Can Partial Strong Labels Boost Multi-label Object Recognition? , 2015, ArXiv.

[7]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[8]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[9]  Yu Zhang,et al.  Exploit Bounding Box Annotations for Multi-Label Object Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Jason Weston,et al.  Large scale image annotation: learning to rank with joint word-image embeddings , 2010, Machine Learning.

[13]  Alan L. Yuille,et al.  Multi-Instance Visual-Semantic Embedding , 2015, BMVC 2017.

[14]  Jun Wang,et al.  Deep Attributes from Context-Aware Regional Neural Codes , 2015, ArXiv.

[15]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Dumitru Erhan,et al.  Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Yangqing Jia,et al.  Deep Convolutional Ranking for Multilabel Image Annotation , 2013, ICLR.

[18]  Jonghyun Choi,et al.  Multi-Directional Multi-Level Dual-Cross Patterns for Robust Face Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[20]  TaoDacheng,et al.  Large-Margin Multi-ViewInformation Bottleneck , 2014 .

[21]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Jian Dong,et al.  Subcategory-Aware Object Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Dacheng Tao,et al.  Asymptotic Generalization Bound of Fisher’s Linear Discriminant Analysis , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Dacheng Tao,et al.  Large-Margin Multi-ViewInformation Bottleneck , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[26]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[27]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[28]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[29]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[30]  Dacheng Tao,et al.  Classification with Noisy Labels by Importance Reweighting , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Bernt Schiele,et al.  How good are detection proposals, really? , 2014, BMVC.

[32]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[33]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[34]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[36]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[37]  Bernt Schiele,et al.  What Makes for Effective Detection Proposals? , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.