Object categorization by learned universal visual dictionary

This paper presents a new algorithm for the automatic recognition of object classes from images (categorization). Compact and yet discriminative appearance-based object class models are automatically learned from a set of training images. The method is simple and extremely fast, making it suitable for many applications such as semantic image retrieval, Web search, and interactive image editing. It classifies a region according to the proportions of different visual words (clusters in feature space). The specific visual words and the typical proportions in each object are learned from a segmented training set. The main contribution of this paper is twofold: i) an optimally compact visual dictionary is learned by pair-wise merging of visual words from an initially large dictionary. The final visual words are described by GMMs. ii) A novel statistical measure of discrimination is proposed which is optimized by each merge operation. High classification accuracy is demonstrated for nine object classes on photographs of real objects viewed under general lighting conditions, poses and viewpoints. The set of test images used for validation comprise: i) photographs acquired by us, ii) images from the Web and iii) images from the recently released Pascal dataset. The proposed algorithm performs well on both texture-rich objects (e.g. grass, sky, trees) and structure-rich ones (e.g. cars, bikes, planes)

[1]  James M. Kasson,et al.  An analysis of selected computer interchange color spaces , 1992, TOGS.

[2]  Jitendra Malik,et al.  Recognizing surfaces using three-dimensional textons , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[3]  Carlo Tomasi,et al.  Texture-based image retrieval without segmentation , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[5]  Michel Vidal-Naquet,et al.  A Fragment-Based Approach to Object Representation and Classification , 2001, IWVF.

[6]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[7]  Andrew Zisserman,et al.  Texture classification: are filter banks necessary? , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[8]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  Andrew Zisserman,et al.  Extending Pictorial Structures for Object Recognition , 2004, BMVC.

[10]  G. Nason,et al.  A Haar-Fisz Algorithm for Poisson Intensity Estimation , 2004 .

[11]  Andrew Zisserman,et al.  A Statistical Approach to Texture Classification from Single Images , 2004, International Journal of Computer Vision.

[12]  Pietro Perona,et al.  A Visual Category Filter for Google Images , 2004, ECCV.

[13]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[14]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[15]  Tom Minka,et al.  Vision texture for annotation , 1995, Multimedia Systems.