SLAMCast: Large-Scale, Real-Time 3D Reconstruction and Streaming for Immersive Multi-Client Live Telepresence

Real-time 3D scene reconstruction from RGB-D sensor data, as well as the exploration of such data in VR/AR settings, has seen tremendous progress in recent years. The combination of both these components into telepresence systems, however, comes with significant technical challenges. All approaches proposed so far are extremely demanding on input and output devices, compute resources and transmission bandwidth, and they do not reach the level of immediacy required for applications such as remote collaboration. Here, we introduce what we believe is the first practical client-server system for real-time capture and many-user exploration of static 3D scenes. Our system is based on the observation that interactive frame rates are sufficient for capturing and reconstruction, and real-time performance is only required on the client site to achieve lag-free view updates when rendering the 3D model. Starting from this insight, we extend previous voxel block hashing frameworks by introducing a novel thread-safe GPU hash map data structure that is robust under massively concurrent retrieval, insertion and removal of entries on a thread level. We further propose a novel transmission scheme for volume data that is specifically targeted to Marching Cubes geometry reconstruction and enables a 90% reduction in bandwidth between server and exploration clients. The resulting system poses very moderate requirements on network bandwidth, latency and client-side computation, which enables it to rely entirely on consumer-grade hardware, including mobile devices. We demonstrate that our technique achieves state-of-the-art representation accuracy while providing, for any number of clients, an immersive and fluid lag-free viewing experience even during network outages.

[1]  Robin Wolff,et al.  A Mixed Reality Telepresence System for Collaborative Space Operation , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Olaf Kähler,et al.  Real-Time Large-Scale Dense 3D Reconstruction with Loop Closure , 2016, ECCV.

[3]  John J. Leonard,et al.  Real-time large-scale dense RGB-D SLAM with volumetric fusion , 2014, Int. J. Robotics Res..

[4]  Hao Li,et al.  Avatar digitization from a single image for real-time rendering , 2017, ACM Trans. Graph..

[5]  Ruigang Yang,et al.  3D Tele-Collaboration Over Internet2 , 2002 .

[6]  Sylvain Lefebvre,et al.  Coherent parallel hashing , 2011, ACM Trans. Graph..

[7]  Henry Fuchs,et al.  Real-time volumetric 3D capture of room-sized scenes for telepresence , 2012, 2012 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video (3DTV-CON).

[8]  Philip H. S. Torr,et al.  Collaborative Large-Scale Dense 3D Reconstruction with Online Inter-Agent Pose Optimisation , 2018, IEEE Transactions on Visualization and Computer Graphics.

[9]  Jean-Stéphane Varré,et al.  Perfect Hashing Structures for Parallel Similarity Searches , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop.

[10]  Takeo Kanade,et al.  Virtualized Reality: Constructing Virtual Worlds from Real Scenes , 1997, IEEE Multim..

[11]  Takeo Kanade,et al.  Virtual Space Teleconferencing Using a Sea of Cameras , 1994 .

[12]  John J. Leonard,et al.  Kintinuous: Spatially Extended KinectFusion , 2012, AAAI 2012.

[13]  Kostas Daniilidis,et al.  View-independent scene acquisition for tele-presence , 2000, Proceedings IEEE and ACM International Symposium on Augmented Reality (ISAR 2000).

[14]  Bernd Fröhlich,et al.  Photoportals: shared references in space and time , 2014, CSCW.

[15]  Nina Amenta,et al.  Efficient hash tables on the gpu , 2011 .

[16]  Tomohiro Tanikawa,et al.  Real world video avatar: real-time and real-size transmission and presentation of human figure , 2005, ICAT '05.

[17]  Henry Fuchs,et al.  Immersive 3D Telepresence , 2014, Computer.

[18]  Olaf Kähler,et al.  Hierarchical Voxel Block Hashing for Efficient Integration of Depth Images , 2016, IEEE Robotics and Automation Letters.

[19]  Jiawen Chen,et al.  Scalable real-time volumetric surface reconstruction , 2013, ACM Trans. Graph..

[20]  John D. Owens,et al.  A Dynamic Hash Table for the GPU , 2017, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[21]  Gwendal Simon,et al.  Viewport-adaptive navigable 360-degree video delivery , 2016, 2017 IEEE International Conference on Communications (ICC).

[22]  Jérémie Allard,et al.  Multicamera Real-Time 3D Modeling for Telepresence and Remote Collaboration , 2010, Int. J. Digit. Multim. Broadcast..

[23]  Matthias Nießner,et al.  Real-time 3D reconstruction at scale using voxel hashing , 2013, ACM Trans. Graph..

[24]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[25]  Sylvain Lefebvre,et al.  Perfect spatial hashing , 2006, ACM Trans. Graph..

[26]  William E. Lorensen,et al.  Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[27]  Blair MacIntyre,et al.  RoomAlive: magical experiences enabled by scalable, adaptive projector-camera units , 2014, UIST.

[28]  Annette Mossel,et al.  Streaming and Exploration of Dynamically Changing Dense 3D Reconstructions in Immersive Virtual Reality , 2016, 2016 IEEE International Symposium on Mixed and Augmented Reality (ISMAR-Adjunct).

[29]  Gary Fontaine,et al.  The Experience of a Sense of Presence in Intercultural and International Encounters , 1992, Presence: Teleoperators & Virtual Environments.

[30]  Ran Ju,et al.  VR is on the Edge: How to Deliver 360° Videos in Mobile Networks , 2017, VR/AR Network@SIGCOMM.

[31]  Kun Peng,et al.  Enhanced personal autostereoscopic telepresence system using commodity depth cameras , 2012, Comput. Graph..

[32]  Marsette Vona,et al.  Moving Volume KinectFusion , 2012, BMVC.

[33]  Pushmeet Kohli,et al.  Fusion4D , 2016, ACM Trans. Graph..

[34]  Matthias Nießner,et al.  BundleFusion , 2016, TOGS.

[35]  Laxmi N. Bhuyan,et al.  Stadium Hashing: Scalable and Flexible Hashing on GPUs , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[36]  Ruzena Bajcsy,et al.  Immersive 3D Environment for Remote Collaboration and Training of Physical Activities , 2008, 2008 IEEE Virtual Reality Conference.

[37]  Daniel Cremers,et al.  Semi-dense visual odometry for AR on a smartphone , 2014, 2014 IEEE International Symposium on Mixed and Augmented Reality (ISMAR).

[38]  Rasmus Pagh,et al.  Practical perfect hashing in nearly optimal space , 2013, Inf. Syst..

[39]  Charles T. Loop,et al.  Holoportation: Virtual 3D Teleportation in Real-time , 2016, UIST.

[40]  Mohammad Hosseini,et al.  Adaptive 360 VR Video Streaming: Divide and Conquer , 2016, 2016 IEEE International Symposium on Multimedia (ISM).

[41]  Paul E. Debevec,et al.  Rapid Photorealistic Blendshape Modeling from RGB-D Sensors , 2016, CASA.

[42]  Charles T. Loop,et al.  Real-time high-resolution sparse voxelization with application to image-based modeling , 2013, HPG '13.

[43]  Ulrich A. Müller,et al.  Operators on Inhomogeneous Time Series , 2000 .

[44]  Xiang Cao,et al.  Interactive Environment-Aware Handheld Projectors for Pervasive Computing Spaces , 2012, Pervasive.

[45]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[46]  Ruzena Bajcsy,et al.  High-Quality Visualization for Geographically Distributed 3-D Teleimmersive Applications , 2011, IEEE Transactions on Multimedia.

[47]  Olaf Kähler,et al.  Very High Frame Rate Volumetric Integration of Depth Images on Mobile Devices , 2015, IEEE Transactions on Visualization and Computer Graphics.

[48]  Andrew W. Fitzgibbon,et al.  KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[49]  Dieter Fox,et al.  DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  SunXin,et al.  Very High Frame Rate Volumetric Integration of Depth Images on Mobile Devices , 2015 .

[51]  Cheng-Hsin Hsu,et al.  Fixation Prediction for 360° Video Streaming in Head-Mounted Virtual Reality , 2017, NOSSDAV.

[52]  David B. Kaber,et al.  Telepresence , 1998, Hum. Factors.

[53]  Dieter Fox,et al.  Patch Volumes: Segmentation-Based Consistent Mapping with RGB-D Cameras , 2013, 2013 International Conference on 3D Vision.

[54]  Michael J. Singer,et al.  Measuring Presence in Virtual Environments: A Presence Questionnaire , 1998, Presence.