Experimental Evaluation of the Bag-of-Features Model for Unsupervised Learning of Images

The Bag-of-Features (BoF) [2] is a popular model that aims to represent images as an orderless collection of features without the use of any spatial information. Each image is represented by a frequency histogram of visual words from a codebook. Although the model is quite simple with regards to the implementation, there are several steps in which parameters and algorithms need to be chosen. This work aimed to assess the performance of this model for the application of unsupervised learning for a set of images, also called image clustering. Additionally, it aims to provide valuable insight on the different steps of the model and to compare different algorithms in order to achieve the best performance for a given dataset. All the source code of this work is available open-source 1 at Github. The applications of image clustering are endless and could include social network mining, more specifically for summarization of the huge amount of content shared everyday by millions of users. In order to obtain the BoF representation of an image collection, many steps are required and are illustrated in Figure 1. The first one is the image description step, in which the input images are processed by first detecting keypoints or patches and then describing them using a certain algorithm. The next step is codebook learning, where a portion of the keypoints extracted from the images are clustered in order to obtain a codebook of visual words. This usually requires sampling of the total number of keypoints obtained from the images. The following step is the BoF representation of the images where each image is represented by a histogram of frequencies of visual words from the codebook obtained previously. The words are then filtered and the histograms are normalized following a chosen methodology. Finally, clustering is applied to the final representation of the images.

[1]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[2]  Yixin Chen,et al.  CLUE: cluster-based retrieval of images by unsupervised learning , 2005, IEEE Transactions on Image Processing.

[3]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[4]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[6]  D. Sculley,et al.  Web-scale k-means clustering , 2010, WWW '10.

[7]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[8]  Ka Yee Yeung,et al.  Validating clustering for gene expression data , 2001, Bioinform..

[9]  Junjie Wu,et al.  Towards information-theoretic K-means clustering for image indexing , 2013, Signal Process..

[10]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[11]  Cécile Barat,et al.  Fusion of tf.idf weighted bag of visual features for image classification , 2010, 2010 International Workshop on Content Based Multimedia Indexing (CBMI).

[12]  Pierre Vandergheynst,et al.  FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[14]  Ivan Laptev,et al.  Recognizing human actions in still images: a study of bag-of-features and part-based representations , 2010, BMVC.

[15]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[17]  Sergios Theodoridis,et al.  Pattern Recognition , 1998, IEEE Trans. Neural Networks.

[18]  Yiannis Kompatsiaris,et al.  Cluster-Based Landmark and Event Detection for Tagged Photo Collections , 2011, IEEE MultiMedia.

[19]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[20]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[21]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[22]  Olatz Arbelaitz,et al.  An extensive comparative study of cluster validity indices , 2013, Pattern Recognit..

[23]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[24]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[25]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[26]  Kurt Konolige,et al.  CenSurE: Center Surround Extremas for Realtime Feature Detection and Matching , 2008, ECCV.

[27]  Steven M. Seitz,et al.  Scene Summarization for Online Image Collections , 2007, 2007 IEEE 11th International Conference on Computer Vision.