论文信息 - Text2SceneVR: Generating Hypertexts with VAnnotatoR as a Pre-processing Step for Text2Scene Systems

Text2SceneVR: Generating Hypertexts with VAnnotatoR as a Pre-processing Step for Text2Scene Systems

The automatic generation of digital scenes from texts is a central task of computer science. This task requires a kind of text comprehension, the automation of which is tied to the availability of sufficiently large, diverse and deeply annotated data, which is freely available. This paper introduces Text2SceneVR, a system that addresses this bottleneck problem by allowing its users to create a sort of spatial hypertexts in Virtual Reality (VR). We describe Text2SceneVR's data model, its user interface and a number of problems related to the implicitness of natural language in the manifestation of spatial relations that Text2SceneVR aims to address while trying to remain language independent. Finally, we present a user study with which we evaluated Text2SceneVR.

Alexander Mehler | Alexander Henlein | Giuseppe Abrami | Attila Kett

[1] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[2] Alexander Mehler,et al. VAnnotatoR: A Framework for Generating Multimodal Hypertexts , 2018, HT.

[3] Frank M. Shipman,et al. Spatial hypertext: designing for change , 1995, CACM.

[4] Lucy Vanderwende,et al. Learning the Visual Interpretation of Sentences , 2013, 2013 IEEE International Conference on Computer Vision.

[5] Joan Condell,et al. SceneMaker: Automatic Visualisation of Screenplays , 2009, KI.

[6] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[7] Nour Ali,et al. ShyWiki-A spatial hypertext wiki , 2008, Int. Sym. Wikis.

[8] Mark Bernstein,et al. Can we talk about spatial hypertext , 2011, HT '11.

[9] Christopher D. Manning,et al. Stanza: A Python Natural Language Processing Toolkit for Many Human Languages , 2020, ACL.

[10] Luke S. Zettlemoyer,et al. AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[11] Richard Sproat,et al. WordsEye: an automatic text-to-scene conversion system , 2001, SIGGRAPH.

[12] Joan Condell,et al. SceneMaker: Multimodal Visualisation of Natural Language Film Scripts , 2010, KES.

[13] Stephen DiVerdi,et al. Vremiere: In-Headset Virtual Reality Video Editing , 2017, CHI.

[14] Vicente Ordonez,et al. Text2Scene: Generating Compositional Scenes From Textual Descriptions , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Víctor H. Andaluz,et al. Teaching Process for Children with Autism in Virtual Reality Environments , 2017, ICETC.

[16] Peter Hall,et al. A Survey of 3D Indoor Scene Synthesis , 2019, Journal of Computer Science and Technology.

[17] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] M. Võ,et al. Reading scenes: how scene grammar guides attention and aids perception in real-world environments. , 2019, Current opinion in psychology.

[19] Leonidas J. Guibas,et al. PartNet: A Large-Scale Benchmark for Fine-Grained and Hierarchical Part-Level 3D Object Understanding , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Alessandro Rizzi,et al. Semiotics of virtual reality as a communication process , 2016, Behav. Inf. Technol..

[21] Won-Sook Lee,et al. Visualizing Natural Language Descriptions , 2016, ACM Comput. Surv..

[22] Jessica Rubart. On Managing Spatial Hypermedia with Document Stores , 2019, HUMAN@HT.

[23] Markus Funk,et al. Using virtual reality for prototyping interactive architecture , 2017, MUM.

[24] Jörg M. Haake,et al. What's Eliza doing in the Chinese room? Incoherent hyperdocuments—and how to avoid them , 1991, HYPERTEXT '91.

[25] James Pustejovsky,et al. VoxML: A Visualization Modeling Language , 2016, LREC.

[26] Katrin Dennerlein,et al. Narratologie des Raumes , 2009 .

[27] Thilo Götz,et al. Design and implementation of the UIMA Common Analysis System , 2004, IBM Syst. J..

[28] Frank M. Shipman,et al. Parsing and interpreting ambiguous structures in spatial hypermedia , 2005, HYPERTEXT '05.

[29] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[30] Angel X. Chang,et al. Learning Spatial Knowledge for Text to 3D Scene Generation , 2014, EMNLP.

[31] Alexander Mehler,et al. Stolperwege: An App for a Digital Public History of the Holocaust , 2017, HT.

[32] Thomas A. Funkhouser,et al. Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Alexander Mehler,et al. resources2city Explorer: A System for Generating Interactive Walkable Virtual Cities out of File Systems , 2018, UIST.

[34] Pat Hanrahan,et al. Semantically-enriched 3D models for common-sense knowledge , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35] B. Kuehn,et al. Virtual and Augmented Reality Put a Twist on Medical Education. , 2018, JAMA.

[36] Jock D. Mackinlay,et al. The information visualizer, an information workspace , 1991, CHI.

[37] James Pustejovsky,et al. Handbook of Linguistic Annotation , 2017 .

[38] George G. Robertson,et al. The WebBook and the Web Forager: an information workspace for the World-Wide Web , 1996, CHI.

[39] Kraig Finstad,et al. The Usability Metric for User Experience , 2010, Interact. Comput..

[40] Nancy Ide,et al. Bridging the Gaps: Interoperability for GrAF, GATE, and UIMA , 2009, Linguistic Annotation Workshop.

[41] Luke S. Zettlemoyer,et al. End-to-end Neural Coreference Resolution , 2017, EMNLP.

[42] Angel X. Chang,et al. SceneSeer: 3D Scene Design with Natural Language , 2017, ArXiv.

[43] Dipti Misra Sharma,et al. IIT(BHU)–IIITH at CoNLL–SIGMORPHON 2018 Shared Task on Universal Morphological Reinflection , 2018, CoNLL.