Robust Scene Categorization by Learning Image Statistics in Context

We present a generic and robust approach for scene categorization. A complex scene is described by proto-concepts like vegetation, water, fire, sky etc. These proto-concepts are represented by low level features, where we use natural images statistics to compactly represent color invariant texture information by a Weibull distribution. We introduce the notion of contextures which preserve the context of textures in a visual scene with an occurrence histogram (context) of similarities to proto-concept descriptors (texture). In contrast to a codebook approach, we use the similarity to all vocabulary elements to generalize beyond the code words. Visual descriptors are attained by combining different types of contexts with different texture parameters. The visual scene descriptors are generalized to visual categories by training a support vector machine. We evaluate our approach on 3 different datasets: 1) 50 categories for the TRECVID video dataset; 2) the Caltech 101-object images; 3) 89 categories being the intersection of the Corel photo stock with the Art Explosion photo stock. Results show that our approach is robust over different datasets, while maintaining competitive performance.

[1]  Gérard G. Medioni,et al.  Finding Waldo, or focus of attention using local color information , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Gérard G. Medioni,et al.  Finding Waldo, or Focus of Attention Using Local Color Information , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Anil K. Jain,et al.  On image classification: city images vs. landscapes , 1998, Pattern Recognit..

[4]  Arnold W. M. Smeulders,et al.  Color-based object recognition , 1997, Pattern Recognit..

[5]  Anil K. Jain,et al.  Image classification for content-based indexing , 2001, IEEE Trans. Image Process..

[6]  Arnold W. M. Smeulders,et al.  Color Invariance , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Cordelia Schmid,et al.  Indexing Based on Scale Invariant Interest Points , 2001, ICCV.

[8]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[9]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[10]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[11]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[12]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[13]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[14]  Bernt Schiele,et al.  Natural Scene Retrieval Based on a Semantic Modeling Step , 2004, CIVR.

[15]  Jiebo Luo,et al.  Natural scene classification using overcomplete ICA , 2005, Pattern Recognit..

[16]  Thomas Serre,et al.  Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Jitendra Malik,et al.  Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Antonio Torralba,et al.  Describing Visual Scenes using Transformed Dirichlet Processes , 2005, NIPS.

[19]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  Luc Van Gool,et al.  Modeling scenes with local descriptors and latent aspects , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[21]  Arnold W. M. Smeulders,et al.  c ○ 2005 Springer Science + Business Media, Inc. Manufactured in The Netherlands. A Six-Stimulus Theory for Stochastic Texture , 2002 .

[22]  Pietro Perona,et al.  Combining generative models and Fisher kernels for object recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.