Large-scale, real-time 3D scene reconstruction on a mobile device

Google’s Project Tango has made integrated depth sensing and onboard visual-intertial odometry available to mobile devices such as phones and tablets. In this work, we explore the problem of large-scale, real-time 3D reconstruction on a mobile devices of this type. Solving this problem is a necessary prerequisite for many indoor applications, including navigation, augmented reality and building scanning. The main challenges include dealing with noisy and low-frequency depth data and managing limited computational and memory resources. State of the art approaches in large-scale dense reconstruction require large amounts of memory and high-performance GPU computing. Other existing 3D reconstruction approaches on mobile devices either only build a sparse reconstruction, offload their computation to other devices, or require long post-processing to extract the geometric mesh. In contrast, we can reconstruct and render a global mesh on the fly, using only the mobile device’s CPU, in very large (300 m$$^2$$2) scenes, at a resolutions of 2–3 cm. To achieve this, we divide the scene into spatial volumes indexed by a hash map. Each volume contains the truncated signed distance function for that area of space, as well as the mesh segment derived from the distance function. This approach allows us to focus computational and memory resources only in areas of the scene which are currently observed, as well as leverage parallelization techniques for multi-core processing. Furthermore, we describe an on-device post-processing method for fusing datasets from multiple, independent trials, in order to improve the quality and coverage of the reconstruction. We discuss how the particularities of the devices impact our algorithm and implementation decisions. Finally, we provide both qualitative and quantitative results on publicly available RGB-D datasets, and on datasets collected in real-time from two devices.

[1]  Torsten Sattler,et al.  3D Modeling on the Go: Interactive 3D Reconstruction of Large-Scale Scenes on Mobile Devices , 2015, 2015 International Conference on 3D Vision.

[2]  Luc Van Gool,et al.  Accurate and robust registration for in-hand modeling , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[4]  Markus H. Gross,et al.  Optimized Spatial Hashing for Collision Detection of Deformable Objects , 2003, VMV.

[5]  Matthias Nießner,et al.  Real-time 3D reconstruction at scale using voxel hashing , 2013, ACM Trans. Graph..

[6]  Gérard G. Medioni,et al.  Object modeling by registration of multiple range images , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[7]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[8]  Daniel Cremers,et al.  LSD-SLAM: Large-Scale Direct Monocular SLAM , 2014, ECCV.

[9]  John J. Leonard,et al.  Deformation-based loop closure for large scale dense RGB-D SLAM , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Marc Pollefeys,et al.  Live Metric 3D Reconstruction on Mobile Phones , 2013, 2013 IEEE International Conference on Computer Vision.

[11]  Siddhartha S. Srinivasa,et al.  Object Modeling and Recognition from Sparse, Noisy Data via Voxel Depth Carving , 2014, ISER.

[12]  Siddhartha S. Srinivasa,et al.  Chisel: Real Time Large Scale 3D Reconstruction Onboard a Mobile Device using Spatially Hashed Signed Distance Fields , 2015, Robotics: Science and Systems.

[13]  John Amanatides,et al.  A Fast Voxel Traversal Algorithm for Ray Tracing , 1987, Eurographics.

[14]  V. Lepetit,et al.  EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[15]  John J. Leonard,et al.  Robust real-time visual odometry for dense RGB-D mapping , 2013, 2013 IEEE International Conference on Robotics and Automation.

[16]  Michael Wimmer,et al.  A Survey of Real‐Time Hard Shadow Mapping Methods , 2011, Comput. Graph. Forum.

[17]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[18]  Michael Bosse,et al.  Placeless Place-Recognition , 2014, 2014 2nd International Conference on 3D Vision.

[19]  William E. Lorensen,et al.  Marching cubes: A high resolution 3D surface construction algorithm , 1987, SIGGRAPH.

[20]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Michael Garland,et al.  Surface simplification using quadric error metrics , 1997, SIGGRAPH.

[22]  Olaf Kähler,et al.  Very High Frame Rate Volumetric Integration of Depth Images on Mobile Devices , 2015, IEEE Transactions on Visualization and Computer Graphics.

[23]  James R. Larus,et al.  Making Pointer-Based Data Structures Cache Conscious , 2000, Computer.

[24]  Marc Levoy,et al.  Real-time 3D model acquisition , 2002, ACM Trans. Graph..

[25]  Shahram Izadi,et al.  Modeling Kinect Sensor Noise for Improved 3D Reconstruction and Tracking , 2012, 2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission.

[26]  Stefan Leutenegger,et al.  ElasticFusion: Dense SLAM Without A Pose Graph , 2015, Robotics: Science and Systems.

[27]  Dimitrios G. Kottas,et al.  Camera-IMU-based localization: Observability analysis and consistency improvement , 2014, Int. J. Robotics Res..

[28]  Stergios I. Roumeliotis,et al.  C-KLAM: Constrained keyframe-based localization and mapping , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[29]  Stergios I. Roumeliotis,et al.  A Multi-State Constraint Kalman Filter for Vision-aided Inertial Navigation , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[30]  Alberto Elfes,et al.  Using occupancy grids for mobile robot perception and navigation , 1989, Computer.

[31]  Daniel Cremers,et al.  Real-Time Camera Tracking and 3D Reconstruction Using Signed Distance Functions , 2013, Robotics: Science and Systems.

[32]  Ming Zeng,et al.  Octree-based fusion for realtime 3D reconstruction , 2013, Graph. Model..

[33]  Roberto Cipolla,et al.  SceneNet: Understanding Real World Indoor Scenes With Synthetic Data , 2015, ArXiv.

[34]  Wolfram Burgard,et al.  OctoMap : A Probabilistic , Flexible , and Compact 3 D Map Representation for Robotic Systems , 2010 .

[35]  Jiawen Chen,et al.  Scalable real-time volumetric surface reconstruction , 2013, ACM Trans. Graph..

[36]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.