论文信息 - iMAP: Implicit Mapping and Positioning in Real-Time

iMAP: Implicit Mapping and Positioning in Real-Time

We show for the first time that a multilayer perceptron (MLP) can serve as the only scene representation in a realtime SLAM system for a handheld RGB-D camera. Our network is trained in live operation without prior data, building a dense, scene-specific implicit 3D model of occupancy and colour which is also immediately used for tracking. Achieving real-time SLAM via continual training of a neural network against a live image stream requires significant innovation. Our iMAP algorithm uses a keyframe structure and multi-processing computation flow, with dynamic information-guided pixel sampling for speed, with tracking at 10 Hz and global map updating at 2 Hz. The advantages of an implicit MLP over standard dense SLAM techniques include efficient geometry representation with automatic detail control and smooth, plausible filling-in of unobserved regions such as the back surfaces of objects.

[1] Victor Adrian Prisacariu,et al. NeRF-: Neural Radiance Fields Without Known Camera Parameters , 2021, ArXiv.

[2] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[3] Michael Goesele,et al. The Replica Dataset: A Digital Replica of Indoor Spaces , 2019, ArXiv.

[4] Peter Cheeseman,et al. On the Representation and Estimation of Spatial Uncertainty , 1986 .

[5] Razvan Pascanu,et al. Progressive Neural Networks , 2016, ArXiv.

[6] Angela Dai,et al. SG-NN: Sparse Generative Neural Networks for Self-Supervised Scene Completion of RGB-D Scans , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7] S. Grossberg,et al. How does a brain build a cognitive code? , 1980, Psychological review.

[8] Yinda Zhang,et al. Deep Implicit Volume Compression , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Andrew J. Davison,et al. DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[10] David Rolnick,et al. Experience Replay for Continual Learning , 2018, NeurIPS.

[11] Marc Levoy,et al. A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[12] Jonathan T. Barron,et al. Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains , 2020, NeurIPS.

[13] G. Klein,et al. Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[14] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[15] Eddy Ilg,et al. Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction , 2020, ECCV.

[16] Gerard Pons-Moll,et al. Neural Unsigned Distance Fields for Implicit Function Learning , 2020, NeurIPS.

[17] Matthias Nießner,et al. BundleFusion , 2016, TOGS.

[18] Richard A. Newcombe,et al. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Jonathan T. Barron,et al. iNeRF: Inverting Neural Radiance Fields for Pose Estimation , 2020, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20] Gordon Wetzstein,et al. Implicit Neural Representations with Periodic Activation Functions , 2020, NeurIPS.

[21] Wolfram Burgard,et al. A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[22] John J. Leonard,et al. Kintinuous: Spatially Extended KinectFusion , 2012, AAAI 2012.

[23] Juan D. Tardós,et al. ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras , 2016, IEEE Transactions on Robotics.

[24] Stefan Leutenegger,et al. ElasticFusion: Dense SLAM Without A Pose Graph , 2015, Robotics: Science and Systems.

[25] Andrew W. Fitzgibbon,et al. KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[26] Davide Maltoni,et al. Continuous Learning in Single-Incremental-Task Scenarios , 2018, Neural Networks.

[27] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28] Tim Weyrich,et al. Real-Time 3D Reconstruction in Dynamic Scenes Using Point-Based Fusion , 2013, 2013 International Conference on 3D Vision.

[29] Stefan Leutenegger,et al. Efficient Octree-Based Volumetric SLAM Supporting Signed-Distance and Occupancy Mapping , 2018, IEEE Robotics and Automation Letters.

[30] Yee Whye Teh,et al. Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[31] David Filliat,et al. Generative Models from the perspective of Continual Learning , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).

[32] Pratul P. Srinivasan,et al. NeRF , 2020, ECCV.

[33] Marc Pollefeys,et al. RoutedFusion: Learning Real-Time Depth Map Fusion , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Sebastian Nowozin,et al. Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Jiwon Kim,et al. Continual Learning with Deep Generative Replay , 2017, NIPS.

[36] Marc Pollefeys,et al. Convolutional Occupancy Networks , 2020, ECCV.

[37] Torsten Sattler,et al. BAD SLAM: Bundle Adjusted Direct RGB-D SLAM , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Stefan Leutenegger,et al. CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAM , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39] Andrew Davison,et al. NodeSLAM: Neural Object Descriptors for Multi-View Shape Reconstruction , 2020, 2020 International Conference on 3D Vision (3DV).