Foreground Focus: Finding Meaningful Features in Unlabeled Images

We present a method to automatically discover meaningful features in unlabeled image collections. Each image is decomposed into semi-local features that describe neighborhood appearance and geometry. The goal is to determine for each image which of these parts are most relevant, given the image content in the remainder of the collection. Our method first c omputes an initial image-level grouping based on feature correspondences, and then iteratively refines cluster assignments based on the evolving in tra-cluster pattern of local matches. As a result, the significance attributed to each feature influences an image’s cluster membership, while related image s in a cluster affect the estimated significance of their features. We show that this mutual reinforcement of object-level and feature-level similari ty improves unsupervised image clustering, and apply the technique to automatically discover categories and foreground regions in images from benchmark datasets.

[1]  Tsuhan Chen,et al.  Unsupervised Image Categorization and Object Localization using Topic Models and Correspondences between Images , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[2]  Paul Newman,et al.  Probabilistic Appearance Based Navigation and Loop Closing , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[3]  Stefano Soatto,et al.  Proximity Distribution Kernels for Geometric Context in Category Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[4]  Cordelia Schmid,et al.  Semi-Local Affine Parts for Object Recognition , 2004, BMVC.

[5]  Yong Jae Lee,et al.  Foreground Focus: Unsupervised Learning from Partially Matching Images , 2009, International Journal of Computer Vision.

[6]  Tsuhan Chen,et al.  Semantic-Shift for Unsupervised Object Detection , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[7]  Cordelia Schmid,et al.  Spatial Weighting for Bag-of-Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[10]  Alexei A. Efros,et al.  Discovering object categories in image collections , 2005 .

[11]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Brendan J. Frey,et al.  Non-metric affinity propagation for unsupervised image categorization , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[14]  Luc Van Gool,et al.  Efficient Mining of Frequent and Distinctive Feature Configurations , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[15]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[16]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[18]  Nebojsa Jojic,et al.  LOCUS: learning object classes with unsupervised segmentation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[19]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[21]  Andrew Zisserman,et al.  An Exemplar Model for Learning Object Classes , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Cordelia Schmid,et al.  Selection of scale-invariant parts for object class recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[23]  Andrew Blake,et al.  Cosegmentation of Image Pairs by Histogram Matching - Incorporating a Global Constraint into MRFs , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[24]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[25]  Cordelia Schmid,et al.  A sparse texture representation using affine-invariant regions , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[26]  Trevor Darrell,et al.  Fast contour matching using approximate earth mover's distance , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[27]  Peter Auer,et al.  Generic object recognition with boosting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Yong Jae Lee,et al.  Discovering Multi-Aspect Structure to Learn From Loosely Labeled Image Collections , 2008 .

[29]  Luc Van Gool,et al.  Modeling scenes with local descriptors and latent aspects , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[30]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[31]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  TomasiCarlo,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000 .

[33]  Andrew Zisserman,et al.  Video data mining using configurations of viewpoint invariant regions , 2004, CVPR 2004.

[34]  Trevor Darrell,et al.  Unsupervised Learning of Categories from Sets of Partially Matching Image Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[35]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[36]  Ankur Agarwal,et al.  Hyperfeatures - Multilevel Local Coding for Visual Recognition , 2006, ECCV.

[37]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[38]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[39]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..