Weakly Supervised Object Localization Using Size Estimates

We present a technique for weakly supervised object localization (WSOL), building on the observation that WSOL algorithms usually work better on images with bigger objects. Instead of training the object detector on the entire training set at the same time, we propose a curriculum learning strategy to feed training images into the WSOL learning loop in an order from images containing bigger objects down to smaller ones. To automatically determine the order, we train a regressor to estimate the size of the object given the whole image as input. Furthermore, we use these size estimates to further improve the re-localization step of WSOL by assigning weights to object proposals according to how close their size matches the estimated object size. We demonstrate the effectiveness of using size order and size weighting on the challenging PASCAL VOC 2007 dataset, where we achieve a significant improvement over existing state-of-the-art WSOL techniques.

[1]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[2]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[3]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Chong Wang,et al.  Large-Scale Weakly Supervised Object Localization via Latent Category Learning , 2015, IEEE Transactions on Image Processing.

[5]  Yong Jae Lee,et al.  Weakly-supervised Discovery of Visual Pattern Configurations , 2014, NIPS.

[6]  Tao Xiang,et al.  Transfer Learning by Ranking for Weakly Supervised Object Annotation , 2017, BMVC.

[7]  Yong Jae Lee,et al.  Learning the easy things first: Self-paced visual category discovery , 2011, CVPR 2011.

[8]  Christoph H. Lampert,et al.  Curriculum learning of multiple tasks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Carsten Rother,et al.  Weakly supervised discriminative localization and classification: a joint learning process , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  Greg Mori,et al.  Similarity Constrained Latent Support Vector Machine: An Application to Weakly Supervised Action Classification , 2012, ECCV.

[11]  Thomas Deselaers,et al.  What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Cordelia Schmid,et al.  Weakly Supervised Object Localization with Multi-Fold Multiple Instance Learning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  R. Fisher The Advanced Theory of Statistics , 1943, Nature.

[14]  Yang Wang,et al.  Weakly supervised localization of novel objects using appearance transfer , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Tinne Tuytelaars,et al.  Weakly Supervised Detection with Posterior Regularization , 2014, BMVC.

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[18]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[19]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[21]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[22]  Bernt Schiele,et al.  Learning using privileged information: SV M+ and weighted SVM , 2013, Neural Networks.

[23]  Christoph H. Lampert,et al.  Learning to Rank Using Privileged Information , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Zaïd Harchaoui,et al.  On learning to localize objects with minimal supervision , 2014, ICML.

[25]  Fei-Fei Li,et al.  Co-localization in Real-World Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Ming Yang,et al.  Regionlets for Generic Object Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[27]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[28]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[29]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[31]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[32]  Trevor Darrell,et al.  LSDA: Large Scale Detection through Adaptation , 2014, NIPS.

[33]  Fei-Fei Li,et al.  Object-Centric Spatial Pooling for Image Classification , 2012, ECCV.

[34]  Dim P. Papadopoulos,et al.  How Hard Can It Be? Estimating the Difficulty of Visual Search in an Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Paul A. Viola,et al.  Multiple Instance Boosting for Object Detection , 2005, NIPS.

[36]  Thomas Deselaers,et al.  Localizing Objects While Learning Their Appearance , 2010, ECCV.

[37]  Matthieu Guillaumin,et al.  Large-scale knowledge transfer for object localization in ImageNet , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Tao Xiang,et al.  Weakly supervised object detector learning with model drift detection , 2011, 2011 International Conference on Computer Vision.

[39]  Tinne Tuytelaars,et al.  Weakly supervised object detection with convex clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Tao Xiang,et al.  Bayesian Joint Modelling for Object Localisation in Weakly Labelled Images , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Antonio Torralba,et al.  Unsupervised Detection of Regions of Interest Using Iterative Link Analysis , 2009, NIPS.

[42]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[43]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[44]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[45]  T. Tuytelaars,et al.  Weakly Supervised Object Detection with Posterior Regularization , 2014 .

[46]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Svetlana Lazebnik,et al.  Scene recognition and weakly supervised object localization with deformable part-based models , 2011, 2011 International Conference on Computer Vision.

[48]  Donald J. Wheeler,et al.  Understanding Statistical Process Control , 1986 .

[49]  Cordelia Schmid,et al.  Multi-fold MIL Training for Weakly Supervised Object Localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Cordelia Schmid,et al.  Learning object class detectors from weakly annotated video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.