Unsupervised Visual Object Categorisation with BoF and Spatial Matching

The ultimate challenge of image categorisation is unsupervised object discovery, where the selection of categories and the assignments of given images to these categories are performed automatically. The unsupervised setting prohibits the use of the best discriminative methods, and in Tuytelaars et al. [30] the standard Bag-of-Features (BoF) approach performed the best. The downside of the BoF is that it omits spatial information of local features. In this work, we propose a novel unsupervised image categorisation method which uses the BoF to find initial matches for each image (pre-filter) and then refines and ranks them using spatial matching of local features. Unsupervised visual object discovery is performed by the normalised cuts algorithm which produces the clusterings from a similarity matrix representing the spatial match scores. In our experiments, the proposed approach outperforms the best method in Tuytelaars et al with the Caltech-101, randomised Caltech-101, and Caltech-256 data sets. Especially for a large number of classes, clear and statistically significant improvements are achieved.

[1]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[2]  Alexei A. Efros,et al.  Unsupervised discovery of visual object class hierarchies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Joni-Kristian Kämäräinen,et al.  Local Feature Based Unsupervised Alignment of Object Class Images , 2011, BMVC.

[4]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[5]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  Dong Liu,et al.  Unsupervised object category discovery via information bottleneck method , 2010, ACM Multimedia.

[8]  Thomas Deselaers,et al.  ClassCut for Unsupervised Class Segmentation , 2010, ECCV.

[9]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[10]  Cordelia Schmid,et al.  Toward Category-Level Object Recognition , 2006, Toward Category-Level Object Recognition.

[11]  Jiri Matas,et al.  Unsupervised discovery of co-occurrence in sparse high dimensional data , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[14]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[15]  Christos Faloutsos,et al.  Unsupervised modeling of object categories using link analysis techniques , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[18]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[19]  C. Schmid,et al.  Indexing based on scale invariant interest points , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[20]  Frédéric Jurie,et al.  Modeling spatial layout with fisher vectors for image categorization , 2011, 2011 International Conference on Computer Vision.

[21]  Fei-Fei Li,et al.  What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[22]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[23]  Alexei A. Efros,et al.  Discovering object categories in image collections , 2005 .

[24]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[25]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[26]  Axel Pinz,et al.  Computer Vision – ECCV 2006 , 2006, Lecture Notes in Computer Science.

[27]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[28]  Cordelia Schmid,et al.  Dataset Issues in Object Recognition , 2006, Toward Category-Level Object Recognition.

[29]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[30]  Mads Nielsen,et al.  Computer Vision — ECCV 2002 , 2002, Lecture Notes in Computer Science.

[31]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[32]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[33]  Christoph H. Lampert,et al.  Unsupervised Object Discovery: A Comparison , 2010, International Journal of Computer Vision.

[34]  Joni-Kristian Kämäräinen,et al.  Unsupervised object discovery via self-organisation , 2012, Pattern Recognit. Lett..

[35]  Joni-Kristian Kämäräinen,et al.  Making Visual Object Categorization More Challenging: Randomized Caltech-101 Data Set , 2010, 2010 20th International Conference on Pattern Recognition.

[36]  Christopher M. Bishop,et al.  Non-linear Bayesian Image Modelling , 2000, ECCV.