Interactive Learning of Spatial Knowledge for Text to 3D Scene Generation

We present an interactive text to 3D scene generation system that learns the expected spatial layout of objects from data. A user provides input natural language text from which we extract explicit constraints on the objects that should appear in the scene. Given these explicit constraints, the system then uses prior observations of spatial arrangements in a database of scenes to infer the most likely layout of the objects in the scene. Through further user interaction, the system gradually adjusts and improves its estimates of where objects should be placed. We present example generated scenes and user interaction scenarios.

[1]  Yun Jiang,et al.  Learning Object Arrangements in 3D Scenes using Human Context , 2012, ICML.

[2]  Benjamin Rosman,et al.  Learning spatial relationships between objects , 2011, Int. J. Robotics Res..

[3]  Jane Wilhelms,et al.  Put: language-based interactive manipulation of objects , 1996, IEEE Computer Graphics and Applications.

[4]  Russell H. Taylor,et al.  Superfaces: polygonal mesh simplification with bounded error , 1996, IEEE Computer Graphics and Applications.

[5]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[6]  Richard Sproat,et al.  WordsEye: an automatic text-to-scene conversion system , 2001, SIGGRAPH.

[7]  Pat Hanrahan,et al.  Example-based synthesis of 3D object arrangements , 2012, ACM Trans. Graph..

[8]  Maneesh Agrawala,et al.  Interactive furniture layout using interior design guidelines , 2011, SIGGRAPH 2011.

[9]  Lucy Vanderwende,et al.  Learning the Visual Interpretation of Sentences , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Desney S. Tan,et al.  CueFlik: interactive concept learning in image search , 2008, CHI.

[11]  Pat Hanrahan,et al.  On being the right scale: sizing large collections of 3D models , 2014, SIGGRAPH ASIA Indoor Scene Understanding Where Graphics Meets Vision.

[12]  James Fogarty,et al.  Regroup: interactive machine learning for on-demand group creation in social networks , 2012, CHI.

[13]  Lijun Yin,et al.  Real-time automatic 3D scene generation from natural language voice and text descriptions , 2006, MM '06.

[14]  Yun Jiang,et al.  Infinite Latent Conditional Random Fields for Modeling Environments through Humans , 2013, Robotics: Science and Systems.

[15]  Terry Winograd,et al.  Understanding natural language , 1974 .

[16]  Bob Coyne,et al.  Annotation Tools and Knowledge Representation for a Text-To-Scene System , 2012, COLING.