论文信息 - Weakly Supervised Object Localization with Multi-Fold Multiple Instance Learning

Weakly Supervised Object Localization with Multi-Fold Multiple Instance Learning

Object category localization is a challenging problem in computer vision. Standard supervised training requires bounding box annotations of object instances. This time-consuming annotation process is sidestepped in weakly supervised learning. In this case, the supervised information is restricted to binary labels that indicate the absence/presence of object instances in the image, without their locations. We follow a multiple-instance learning approach that iteratively trains the detector and infers the object locations in the positive training images. Our main contribution is a multi-fold multiple instance learning procedure, which prevents training from prematurely locking onto erroneous object locations. This procedure is particularly important when using high-dimensional representations, such as Fisher vectors and convolutional neural network features. We also propose a window refinement method, which improves the localization accuracy by incorporating an objectness prior. We present a detailed experimental evaluation using the PASCAL VOC 2007 dataset, which verifies the effectiveness of our approach.

[1] Thomas G. Dietterich,et al. Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[2] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[3] Tamara L. Berg,et al. Names and faces in the news , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[4] Gabriela Csurka,et al. Visual categorization with bags of keypoints , 2002, eccv 2004.

[5] Thomas Hofmann,et al. Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[6] Yurii Nesterov,et al. Smooth minimization of non-smooth functions , 2005, Math. Program..

[7] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8] Daniel P. Huttenlocher,et al. Weakly Supervised Learning of Part-Based Spatial Models for Visual Object Recognition , 2006, ECCV.

[9] Andrew Zisserman,et al. An Exemplar Model for Learning Object Classes , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[10] Bill Triggs,et al. Region Classification with Markov Field Aspect Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[12] Andrew Zisserman,et al. Taking the bite out of automated naming of characters in TV video , 2009, Image Vis. Comput..

[13] Christoph H. Lampert,et al. Efficient Subwindow Search: A Branch and Bound Framework for Object Localization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Carsten Rother,et al. Weakly supervised discriminative localization and classification: a joint learning process , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[16] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[17] Antonio Torralba,et al. Unsupervised Detection of Regions of Interest Using Iterative Link Analysis , 2009, NIPS.

[18] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Thomas Deselaers,et al. What is an object? , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20] Matthew B. Blaschko,et al. Simultaneous Object Detection and Ranking with Weak Supervision , 2010, NIPS.

[21] Yihong Gong,et al. Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22] Michal Irani,et al. Detecting and sketching the common , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23] Daphne Koller,et al. Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[24] Francesc Alted,et al. Why Modern CPUs Are Starving and What Can Be Done about It , 2010, Computing in Science & Engineering.

[25] Kristen Grauman,et al. Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds , 2011, CVPR 2011.

[26] Tao Xiang,et al. Weakly supervised object detector learning with model drift detection , 2011, 2011 International Conference on Computer Vision.

[27] Cordelia Schmid,et al. Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28] Svetlana Lazebnik,et al. Scene recognition and weakly supervised object localization with deformable part-based models , 2011, 2011 International Conference on Computer Vision.

[29] C. V. Jawahar,et al. The truth about cats and dogs , 2011, 2011 International Conference on Computer Vision.

[30] Cordelia Schmid,et al. Learning object class detectors from weakly annotated video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31] Thomas Deselaers,et al. Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32] Fei-Fei Li,et al. Object-Centric Spatial Pooling for Image Classification , 2012, ECCV.

[33] Tao Xiang,et al. In Defence of Negative Mining for Annotating Weakly Labelled Data , 2012, ECCV.

[34] Tao Xiang,et al. Transfer Learning by Ranking for Weakly Supervised Object Annotation , 2017, BMVC.

[35] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[36] Alexei A. Efros,et al. Unsupervised Discovery of Mid-Level Discriminative Patches , 2012, ECCV.

[37] Thomas Deselaers,et al. Weakly Supervised Localization and Learning with Generic Knowledge , 2012, International Journal of Computer Vision.

[38] Jitendra Malik,et al. Multi-component Models for Object Detection , 2012, ECCV.

[39] Koen E. A. van de Sande,et al. Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[40] Luc Van Gool,et al. Object and Action Classification with Latent Window Parameters , 2013, International Journal of Computer Vision.

[41] Cordelia Schmid,et al. Segmentation Driven Object Detection with Fisher Vectors , 2013, 2013 IEEE International Conference on Computer Vision.

[42] Tao Xiang,et al. Looking Beyond the Image: Unsupervised Learning for Object Saliency and Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[43] Thomas Mensink,et al. Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[44] Ming Yang,et al. Regionlets for Generic Object Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[45] Tao Xiang,et al. Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation , 2013, 2013 IEEE International Conference on Computer Vision.

[46] Yong Jae Lee,et al. Weakly-supervised Discovery of Visual Pattern Configurations , 2014, NIPS.

[47] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[48] Cordelia Schmid,et al. Multi-fold MIL Training for Weakly Supervised Object Localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[49] Zaïd Harchaoui,et al. On learning to localize objects with minimal supervision , 2014, ICML.

[50] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[51] Koen E. A. van de Sande,et al. Fisher and VLAD with FLAIR , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[52] C. Lawrence Zitnick,et al. Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[53] T. Tuytelaars,et al. Weakly Supervised Object Detection with Posterior Regularization , 2014 .

[54] Chong Wang,et al. Weakly Supervised Object Localization with Latent Category Learning , 2014, ECCV.

[55] Qiang Chen,et al. Contextualizing Object Detection and Classification , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56] Cordelia Schmid,et al. Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).