An ontology for generating descriptions about natural outdoor scenes

We present an image ontology useful for generating descriptive texts about highly unconstrained natural outdoor images, taken under many different conditions - lighting, varying viewpoints, etc. The ontology pre-defines the visual content we are interested in describing. Unlike other image description techniques, which tend to be purely object-centric, we utilize a holistic scene ontology for description. The primitive units defined by the ontology are extracted from an image via stochastic processes. Similarly, attributes of the units, also specified by the ontology, are evaluated. Binary and tertiary relationships between relevant primitives are also evaluated. The values, attributes and relationships of the primitive units are combined, based on a pre-defined set of production rules, in such a way as to generate rich, descriptive sentences about the image. Evaluation strategies are implemented to quantitatively test the meaningfulness of the generated sentences. Results indicate that the proposed scene ontology aids in generating highly relevant, naturalistic and meaningful sentences describing natural outdoor images.

[1]  Yejin Choi,et al.  Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.

[2]  N. F. Noy,et al.  Ontology Development 101: A Guide to Creating Your First Ontology , 2001 .

[3]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[4]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[6]  Cyrus Rashtchian,et al.  Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.

[7]  Robert Dale,et al.  Building Natural Language Generation Systems: Figures , 2000 .

[8]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[9]  Stephen Gould,et al.  Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[10]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993 .

[11]  Ifeoma Nwogu,et al.  DISCO: Describing Images Using Scene Contexts and Objects , 2011, AAAI.

[12]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[14]  Tamara L. Berg,et al.  Baby Talk : Understanding and Generating Image Descriptions , 2011 .

[15]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[16]  Daniel P. Huttenlocher,et al.  Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[17]  Liang Lin,et al.  I2T: Image Parsing to Text Description , 2010, Proceedings of the IEEE.