Unsupervised object discovery and localization in images and videos

This paper addresses unsupervised discovery and localization of dominant objects from a noisy collection of images or videos. The setting of this problem is fully unsupervised, without even class labels or any assumption of a single dominant class, and thus far more general than those of typical colocalization or weakly-supervised localization tasks. Interestingly, our approach also discovers the topology of images/frames associated with instances of the same object class, a role normally left to supervisory information in the form of class labels in conventional image and video understanding methods. We tackle the discovery and localization problem using a part-based region matching approach: Off-the-shelf region proposals are extracted to form a set of candidate bounding boxes for objects and object parts, and these regions are effectively matched across images/frames. For each image/frame, a dominant object is localized by comparing the scores of candidate regions and selecting those that stand out over other regions containing them. Given a video collection, we also associate similar object regions along consecutive frames within the same video, thus achieving unsupervised tracking. Extensive experimental evaluations on standard benchmarks demonstrate that the proposed approach substantially outperforms the current state of the art in colocalization, and achieves robust object discovery in challenging mixed-class datasets.

[1]  Fei-Fei Li,et al.  Efficient Image and Video Co-localization with Frank-Wolfe Algorithm , 2014, ECCV.

[2]  ZissermanAndrew,et al.  The Pascal Visual Object Classes Challenge , 2015 .

[3]  Santiago Manen,et al.  Prime Object Proposals with Randomized Prim's Algorithm , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Jean Ponce,et al.  Unsupervised Object Discovery and Tracking in Video Collections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[6]  Ce Liu,et al.  Unsupervised Joint Object Discovery and Segmentation in Internet Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Cordelia Schmid,et al.  Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[9]  Chong Wang,et al.  Weakly Supervised Object Localization with Latent Category Learning , 2014, ECCV.

[10]  Cordelia Schmid,et al.  Multi-fold MIL Training for Weakly Supervised Object Localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Cordelia Schmid,et al.  Learning object class detectors from weakly annotated video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.