A two level approach for scene recognition

Classifying pictures into one of several semantic categories is a classical image understanding problem. In this paper, we present a stratified approach to both binary (outdoor-indoor) and multiple category of scene classification. We first learn mixture models for 20 basic classes of local image content based on color and texture information. Once trained, these models are applied to a test image, and produce 20 probability density response maps (PDRM) indicating the likelihood that each image region was produced by each class. We then extract some very simple features from those PDRMs, and use them to train a bagged LDA classifier for 10 scene categories. For this process, no explicit region segmentation or spatial context model are computed. To test this classification system, we created a labeled database of 1500 photos taken under very different environment and lighting conditions, using different cameras, and from 43 persons over 5 years. The classification rate of outdoor-indoor classification is 93.8%, and the classification rate for 10 scene categories is 90.1%. As a byproduct, local image patches can be contextually labeled into the 20 basic material classes by using loopy belief propagation (Yedidia et al., 2001) as an anisotropic filter on PDRMs, producing an image-level segmentation if desired.

[1]  Michael Brady,et al.  Saliency, Scale and Image Description , 2001, International Journal of Computer Vision.

[2]  Paul A. Viola,et al.  A Non-Parametric Multi-Scale Statistical Model for Natural Images , 1997, NIPS.

[3]  Trevor Hastie,et al.  Feature Extraction for Nonparametric Discriminant Analysis , 2003 .

[4]  Xiaogang Wang,et al.  Dual-space linear discriminant analysis for face recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[5]  Martial Hebert,et al.  An observation-constrained generative approach for probabilistic classification of image regions , 2003, Image Vis. Comput..

[6]  William T. Freeman,et al.  Understanding belief propagation and its generalizations , 2003 .

[7]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[8]  Pietro Perona,et al.  Unsupervised Learning of Models for Recognition , 2000, ECCV.

[9]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[10]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[11]  Robert M. Haralick,et al.  Feature normalization and likelihood-based similarity measures for image retrieval , 2001, Pattern Recognit. Lett..

[12]  Kee Tung. Wong,et al.  Texture features for image classification and retrieval. , 2002 .

[13]  R. Zemel,et al.  Multiscale conditional random fields for image labeling , 2004, CVPR 2004.

[14]  Jitendra Malik,et al.  Contour and Texture Analysis for Image Segmentation , 2001, International Journal of Computer Vision.

[15]  Ja-Chen Lin,et al.  A new LDA-based face recognition system which can solve the small sample size problem , 1998, Pattern Recognit..

[16]  Song-Chun Zhu,et al.  Minimax Entropy Principle and Its Application to Texture Modeling , 1997, Neural Computation.

[17]  Xiaogang Wang,et al.  Random sampling LDA for face recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[18]  Andrew R. Webb,et al.  Statistical Pattern Recognition , 1999 .

[19]  Kristin J. Dana,et al.  Compact representation of bidirectional texture functions , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[20]  Tony Lindeberg,et al.  Principles for Automatic Scale Selection , 1999 .

[21]  Jiebo Luo,et al.  Improved scene classification using efficient low-level features and semantic cues , 2004, Pattern Recognit..

[22]  Gregory D. Hager,et al.  A Three Tiered Approach for Articulated Object Action Modeling and Recognition , 2004, NIPS.

[23]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[24]  Qi Tian,et al.  Discriminant-EM algorithm with application to image retrieval , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[25]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[26]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[27]  Martial Hebert,et al.  Man-made structure detection in natural images using a causal multiscale random field , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[28]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[29]  Jiebo Luo,et al.  Probabilistic spatial context models for scene content understanding , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[30]  Gene H. Golub,et al.  Matrix computations , 1983 .

[31]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[32]  Andrew Zisserman,et al.  Classifying Images of Materials: Achieving Viewpoint and Illumination Independence , 2002, ECCV.

[33]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Martin Szummer,et al.  Indoor-outdoor image classification , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[35]  Anil K. Jain,et al.  Image classification for content-based indexing , 2001, IEEE Trans. Image Process..

[36]  MalikJitendra,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001 .

[37]  Christopher M. Brown,et al.  Learning Spatial Configuration Models Using Modified Dirichlet Priors , 2004 .

[38]  Jiebo Luo,et al.  A computational approach to determination of main subject regions in photographic images , 2004, Image Vis. Comput..