SceneSeer: 3D Scene Design with Natural Language

Designing 3D scenes is currently a creative task that requires significant expertise and effort in using complex 3D design interfaces. This effortful design process starts in stark contrast to the easiness with which people can use language to describe real and imaginary environments. We present SceneSeer: an interactive text to 3D scene generation system that allows a user to design 3D scenes using natural language. A user provides input text from which we extract explicit constraints on the objects that should appear in the scene. Given these explicit constraints, the system then uses a spatial knowledge base learned from an existing database of 3D scenes and 3D object models to infer an arrangement of the objects forming a natural scene matching the input description. Using textual commands the user can then iteratively refine the created scene by adding, removing, replacing, and manipulating objects. We evaluate the quality of 3D scenes generated by SceneSeer in a perceptual evaluation experiment where we compare against manually designed scenes and simpler baselines for 3D scene generation. We demonstrate how the generated scenes can be iteratively refined through simple natural language commands.

[1]  Igor Mordatch,et al.  Multiscale 3D navigation , 2009, I3D '09.

[2]  Pat Hanrahan,et al.  Example-based synthesis of 3D object arrangements , 2012, ACM Trans. Graph..

[3]  Siddhartha Chaudhuri,et al.  Attribit: content creation with semantic attributes , 2013, UIST.

[4]  Angel X. Chang,et al.  Interactive Learning of Spatial Knowledge for Text to 3D Scene Generation , 2014 .

[5]  Igor Mordatch,et al.  ViewCube: a 3D orientation indicator and controller , 2008, I3D '08.

[6]  Lijun Yin,et al.  Real-time automatic 3D scene generation from natural language voice and text descriptions , 2006, MM '06.

[7]  Yun Jiang,et al.  Learning Object Arrangements in 3D Scenes using Human Context , 2012, ICML.

[8]  Benjamin Rosman,et al.  Learning spatial relationships between objects , 2011, Int. J. Robotics Res..

[9]  William Buxton,et al.  When it gets more difficult, use both hands: exploring bimanual curve manipulation , 2005, Graphics Interface.

[10]  Yun Jiang,et al.  Infinite Latent Conditional Random Fields for Modeling Environments through Humans , 2013, Robotics: Science and Systems.

[11]  Maneesh Agrawala,et al.  Interactive furniture layout using interior design guidelines , 2011, SIGGRAPH 2011.

[12]  Jane Wilhelms,et al.  Put: language-based interactive manipulation of objects , 1996, IEEE Computer Graphics and Applications.

[13]  Angel X. Chang,et al.  Learning Spatial Knowledge for Text to 3D Scene Generation , 2014, EMNLP.

[14]  R. Baayen,et al.  Mixed-effects modeling with crossed random effects for subjects and items , 2008 .

[15]  Terry Winograd,et al.  Understanding natural language , 1974 .

[16]  Bob Coyne,et al.  Annotation Tools and Knowledge Representation for a Text-To-Scene System , 2012, COLING.

[17]  Richard Sproat,et al.  WordsEye: an automatic text-to-scene conversion system , 2001, SIGGRAPH.

[18]  Caitlin Kelleher,et al.  Lessons Learned from Designing a Programming System to Support Middle School Girls Creating Animated Stories , 2006, Visual Languages and Human-Centric Computing (VL/HCC'06).

[19]  Tommy Burnette,et al.  Alice: lessons learned from building a 3D system for novices , 2000, CHI.