Generating text description from content-based annotated image

This paper proposes a statistical generative model to generate sentences from an annotated picture. The images are segmented into regions (using Graph-based algorithms) and then features are computed over each of these regions. Given a training set of images with annotations, we parse the image to get position information. We use SVM to get the probabilities of combinations between labels and prepositions, obtain the data to text set. We use a standard semantic representation to express the image message. Finally generate sentence from the xml report. In view of landscape pictures, this paper implemented experiments on the dataset we collected and annotated, obtained ideal results.

[1]  Cyrus Rashtchian,et al.  Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.

[2]  Liang Lin,et al.  I2T: Image Parsing to Text Description , 2010, Proceedings of the IEEE.

[3]  Hui Xiang,et al.  An Improved Graph-Based Image Segmentation Algorithm and Its GPU Acceleration , 2011, 2011 Workshop on Digital Media and Digital Content Management.

[4]  Bernt Schiele,et al.  Sliding-Windows for Rapid Object Class Localization: A Parallel Technique , 2008, DAGM-Symposium.

[5]  Christoph H. Lampert,et al.  Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.