Scene Parsing Using Region-Based Generative Models

Semantic scene classification is a challenging problem in computer vision. In contrast to the common approach of using low-level features computed from the whole scene, we propose "scene parsing" utilizing semantic object detectors (e.g., sky, foliage, and pavement) and region-based scene-configuration models. Because semantic detectors are faulty in practice, it is critical to develop a region-based generative model of outdoor scenes based on characteristic objects in the scene and spatial relationships between them. Since a fully connected scene configuration model is intractable, we chose to model pairwise relationships between regions and estimate scene probabilities using loopy belief propagation on a factor graph. We demonstrate the promise of this approach on a set of over 2000 outdoor photographs, comparing it with existing discriminative approaches and those using low-level features

[1]  John R. Smith,et al.  Image Classification and Querying Using Composite Region Templates , 1999, Comput. Vis. Image Underst..

[2]  Milind R. Naphade,et al.  A probabilistic framework for semantic video indexing, filtering, and retrieval , 2001, IEEE Trans. Multim..

[3]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[4]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[5]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[6]  Jianping Fan,et al.  Learning the semantics of images by using unlabeled samples , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  John R. Smith,et al.  Multimedia semantic indexing using model vectors , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[8]  Jiebo Luo,et al.  Improved semantic region labeling based on scene context , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[9]  Jiebo Luo,et al.  Using Semantic Features for Scene Classification: how Good do they Need to Be? , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[10]  Christopher M. Brown,et al.  The theory and practice of Bayesian image labeling , 1990, International Journal of Computer Vision.

[11]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[12]  Anil K. Jain,et al.  Content-based hierarchical classification of vacation images , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[13]  Jiebo Luo,et al.  A physical model-based approach to detecting sky in photographic images , 2002, IEEE Trans. Image Process..

[14]  Anil K. Jain,et al.  Detecting sky and vegetation in outdoor images , 1999, Electronic Imaging.

[15]  Christopher M. Brown,et al.  Learning Spatial Configuration Models Using Modified Dirichlet Priors , 2004 .

[16]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  William T. Freeman,et al.  Learning Low-Level Vision , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[18]  HongJiang Zhang,et al.  Detecting image orientation based on low-level visual content , 2004, Comput. Vis. Image Underst..

[19]  Philippe Mulhem,et al.  Fuzzy Conceptual Graphs for Matching Images of Natural Scenes , 2001, IJCAI.

[20]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Bernhard Schölkopf,et al.  Kernel Methods for Extracting Local Image Semantics , 2001 .

[22]  Jiebo Luo,et al.  Probabilistic spatial context models for scene content understanding , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..