Collaborative VR-Based 3D Labeling of Live-Captured Scenes by Remote Users

Previous work on interactive 3D labeling focused on improving user experience based on virtual/augmented reality and, thereby, speeding-up the labeling of scenes. In this article, we present a novel interactive, collaborative VR-based 3D labeling system for live-captured scenes by multiple remotely connected users based on sparse multi-user input with automatic label propagation and completion. Hence, our system is particularly beneficial in the case of multiple users that are able to label different scene parts from the respectively adequate views in parallel. Our proposed system relies on 1) the RGB-D capture of an environment by a user, 2) a reconstruction client that integrates this stream into a 3D model, 3) a server that gets scene updates and manages the global 3D scene model as well as client requests and the integration/propagation of labels, 4) labeling clients that allow an independent VR-based scene exploration and labeling for each user, and 5) remotely connected users that provide a sparse 3D labeling used to control the label propagation over objects and the label prediction to other scene parts. Our evaluation demonstrates the intuitive collaborative 3D labeling experience as well as its capability to meet the efficiency constraints regarding reconstruction speed, data streaming, visualization, and labeling.

[1]  Michael Goesele,et al.  The Replica Dataset: A Digital Replica of Indoor Spaces , 2019, ArXiv.

[2]  Timothy Patten,et al.  EasyLabel: A Semi-Automatic Pixel-wise Object Annotation Tool for Creating Robotic RGB-D Datasets , 2019, 2019 International Conference on Robotics and Automation (ICRA).

[3]  Yi Zhang,et al.  UnrealCV: Virtual Worlds for Computer Vision , 2017, ACM Multimedia.

[4]  Duc Thanh Nguyen,et al.  A Robust 3D-2D Interactive Tool for Scene Segmentation and Annotation , 2016, IEEE Transactions on Visualization and Computer Graphics.

[5]  Michael Weinmann,et al.  SLAMCast: Large-Scale, Real-Time 3D Reconstruction and Streaming for Immersive Multi-Client Live Telepresence , 2018, IEEE Transactions on Visualization and Computer Graphics.

[6]  Yuandong Tian,et al.  Building Generalizable Agents with a Realistic and Rich 3D Environment , 2018, ICLR.

[7]  Patrick Pérez,et al.  The Semantic Paintbrush: Interactive 3D Mapping and Recognition in Large Outdoor Spaces , 2015, CHI.

[8]  Jitendra Malik,et al.  Gibson Env: Real-World Perception for Embodied Agents , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Wenbin Li,et al.  InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset , 2018, BMVC.

[10]  Christian Theobalt,et al.  Live User-Guided Intrinsic Video for Static Scenes , 2017, IEEE Transactions on Visualization and Computer Graphics.

[11]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Duc Thanh Nguyen,et al.  SceneNN: A Scene Meshes Dataset with aNNotations , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[13]  Sven Behnke,et al.  Bonn Activity Maps: Dataset Description , 2019, ArXiv.

[14]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).

[15]  Raja Bala,et al.  WARHOL: Wearable Holographic Object Labeler , 2020, The Engineering Reality of Virtual Reality.

[16]  Antonio Criminisi,et al.  Decision Forests for Computer Vision and Medical Image Analysis , 2013, Advances in Computer Vision and Pattern Recognition.

[17]  Zihan Zhou,et al.  Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling , 2019, ECCV.