Leveraging probabilistic season and location context models for scene understanding

Recent research has shown the power of context-aware scene understanding in bridging the semantic gap between high-level semantic concepts and low-level image features. In this paper, we present a new method to exploit nonvisual context information from the season and location proximity in which pictures were taken to facilitate region (object) annotation in consumer photos. Our method does not require precise time and location from the capture device or user input. Instead, it learns from rough location (e.g., states in the US) and time (e.g., seasons) information, which can be obtained through picture metadata automatically or through minimal user input (e.g., grouping). In addition, the visual context within the image is obtained by analyzing the spatial relationships between different regions (objects) in the scene. Both visual and nonvisual context information are fused using a probabilistic graphical model to improve the accuracy of object region recognition. Our method has been evaluated on a database that consists of over 10,000 regions in more than 1000 images collected from both the Web and consumers. Experimental results show that incorporating the season and location context significantly improves the performance of region recognition.

[1]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[3]  A. Murat Tekalp,et al.  Automatic Image Annotation Using Adaptive Color Classification , 1996, CVGIP Graph. Model. Image Process..

[4]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Peter Meer,et al.  Edge Detection with Embedded Confidence , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Jiebo Luo,et al.  Probabilistic spatial context models for scene content understanding , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[7]  Jiebo Luo,et al.  Scene Parsing Using Region-Based Generative Models , 2007, IEEE Transactions on Multimedia.

[8]  Yixin Chen,et al.  Image Categorization by Learning and Reasoning with Regions , 2004, J. Mach. Learn. Res..

[9]  Antonio Torralba,et al.  Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes , 2003, NIPS.

[10]  Martial Hebert,et al.  A hierarchical field framework for unified context-based classification , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[11]  Alberto Del Bimbo,et al.  Spatial arrangement of color in retrieval by visual similarity , 2002, Pattern Recognit..

[12]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[13]  J.R. Smith,et al.  Decoding image semantics using composite region templates , 1998, Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).

[14]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[15]  Niels da Vitoria Lobo,et al.  Features and Classification Methods to Locate Deciduous Trees in Images , 1999, Comput. Vis. Image Underst..

[16]  Peter Meer,et al.  Synergism in low level vision , 2002, Object recognition supported by user interaction for service robots.

[17]  Milind R. Naphade,et al.  A probabilistic framework for semantic video indexing, filtering, and retrieval , 2001, IEEE Trans. Multim..

[18]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[19]  Jiebo Luo,et al.  Using Semantic Features for Scene Classification: how Good do they Need to Be? , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[20]  Aidong Zhang,et al.  Semantics Retrieval by Content and Context of Image Regions , 2002 .

[21]  Anil K. Jain,et al.  Detecting sky and vegetation in outdoor images , 1999, Electronic Imaging.

[22]  Christopher M. Brown,et al.  Learning Spatial Configuration Models Using Modified Dirichlet Priors , 2004 .

[23]  Bo Zhang,et al.  Exploiting spatial context constraints for automatic image region annotation , 2007, ACM Multimedia.

[24]  Thomas S. Huang,et al.  Factor graph framework for semantic video indexing , 2002, IEEE Trans. Circuits Syst. Video Technol..