Automatic Image Synthesis from Keywords Using Scene Context

Text is one of the simplest way to express one's idea, and an image is one of the most impactive way to do so. Therefore, if a system can synthesize an image from text without direct user manipulation, novel image synthesis applications will be opened to users without artistic skills. In such a system, which objects to synthesize will be declared in texts. However, information about positional relations and scale of objects is not much provided and must be estimated using common sense. As described in this paper, we develop a system that can automatically synthesize objects to an image, given the background image and class name of the target synthesizing object. With the inputs as the background image and keywords, images for synthesizing objects are searched automatically. Although some previously developed systems that can synthesize an image from sketches and paintings, this is the first system that can estimate the position, scale, and appearance of objects and automatically synthesize them to images without direct user input. We propose a scene context, which indicates the position, scale, and appearance of synthesizing objects. The contribution of this paper is twofold: (1) the scene context extraction method for automatic image synthesis and (2) application of automatic image synthesis using the scene context.

[1]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[2]  Serge J. Belongie,et al.  Context based object categorization: A critical survey , 2010, Comput. Vis. Image Underst..

[3]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[4]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Antonio Torralba,et al.  Exploiting hierarchical context on a large database of object categories , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Lihi Zelnik-Manor,et al.  Puzzle‐like Collage , 2010, Comput. Graph. Forum.

[7]  Roberto Cipolla,et al.  Semantic Photo Synthesis , 2006, Comput. Graph. Forum.

[8]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Hua Huang,et al.  Arcimboldo-like collage using internet images , 2011, ACM Trans. Graph..

[10]  Shi-Min Hu,et al.  Sketch2Photo: internet image montage , 2009, ACM Trans. Graph..

[11]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2007, SIGGRAPH 2007.

[12]  Lucy Vanderwende,et al.  Learning the Visual Interpretation of Sentences , 2013, 2013 IEEE International Conference on Computer Vision.