Scene categorization using bag of Textons on spatial hierarchy

This paper proposes a method to recognize scene categories using bags of visual words obtained hierarchically partitioning into subregion the input images. Specifically, for each subregions the texton histogram and the extension of the sub-region is taken into account. The bags of visual words, obtained in this way, are weighted and used in a similarity measure during the categorization. Experimental tests using ten different scene categories show that the proposed approach achieves good performances with respect to the state of the art methods.

[1]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[3]  Giovanni Giuffrida,et al.  Combining visual and text features for learning in multimedia direct marketing domain , 2016, VISAPP 2008.

[4]  B. Julesz Textons, the elements of texture perception, and their interactions , 1981, Nature.

[5]  Bernt Schiele,et al.  International Journal of Computer Vision manuscript No. (will be inserted by the editor) Semantic Modeling of Natural Scenes for Content-Based Image Retrieval , 2022 .

[6]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).