Automatic Image Annotation for Description of Urban and Outdoor Scenes

In this paper we present a novel approach for automatic annotation of objects or regions in images based on their color and texture. According to the proposed generalized architecture for automatic generation of image content descriptions the detected regions are labeled by developed cascade SVM-based classifier mapping them to structure that reflects their hierarchical and spatial relation used by text generation engine. For testing the designed system for automatic image annotation around 2,000 images with outdoor-indoor scenes from standard IAPR-TC12 image dataset have been processed obtaining an average precision of classification about 75 % with 94 % of recall. The precision of classification based on color features has been improved up to 15 ± 5 % after extension of classifier with texture detector based on Gabor filter. The proposed approach has a good compromise between classification precision of regions in images and speed despite used considerable time processing taking up to 1 s per image. The approach may be used as a tool for efficient automatic image understanding and description.

[1]  Oleg Starostenko,et al.  Organizing open archives via lightweight ontologies to facilitate the use of heterogeneous collections , 2012, Aslib Proc..

[2]  Larry S. Davis,et al.  Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos , 2009, CVPR.

[3]  Ifeoma Nwogu,et al.  DISCO: Describing Images Using Scene Contexts and Objects , 2011, AAAI.

[4]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Jitendra Malik,et al.  Scale-Space and Edge Detection Using Anisotropic Diffusion , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Larry S. Davis,et al.  Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers , 2008, ECCV.

[8]  Yansong Feng,et al.  How Many Words Is a Picture Worth? Automatic Caption Generation for News Images , 2010, ACL.

[9]  Cyrus Rashtchian,et al.  Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.

[10]  Oleg Starostenko,et al.  Computational approaches to support image-based language learning within mobile environment , 2010, Int. J. Mob. Learn. Organisation.

[11]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[12]  Liang Lin,et al.  I2T: Image Parsing to Text Description , 2010, Proceedings of the IEEE.

[13]  Yejin Choi,et al.  Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.

[14]  Md. Monirul Islam,et al.  A review on automatic image annotation techniques , 2012, Pattern Recognit..

[15]  Hugo Jair Escalante,et al.  The segmented and annotated IAPR TC-12 benchmark , 2010, Comput. Vis. Image Underst..

[16]  Yejin Choi,et al.  Composing Simple Image Descriptions using Web-scale N-grams , 2011, CoNLL.

[17]  Charless C. Fowlkes,et al.  Contour Detection and Hierarchical Image Segmentation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.