Learning to reconstruct 3D structures for occupancy mapping from depth and color information

Real world scenarios contain many structural patterns that, if appropriately extracted and modeled, can be used to reduce problems associated with sensor failure and occlusions, while improving planning methods in tasks such as navigation and grasping. This paper devises a novel unsupervised procedure that is able to learn 3D structures from unorganized point clouds as occupancy maps. Our framework enables the learning of unique and arbitrarily complex features using a Bayesian Convolutional Variational Auto-Encoder that compresses local information into a latent low-dimensional representation and then decodes it back in order to reconstruct the original scene. This reconstructive model is trained on features obtained automatically from a wide variety of scenarios to improve its generalization and interpolative powers. We show that the proposed framework is able to recover partially missing structures and reason over occlusion with high accuracy, while maintaining a detailed reconstruction of observed areas. To seamlessly combine this localized feature information into a single global structure, we employ a Hilbert Map, recently proposed as a robust and efficient occupancy mapping technique. Experimental tests are conducted in large-scale 2D and 3D datasets, and a study on the impact of various accuracy/speed trade-offs is provided to assess the limits of the proposed framework.

[1]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Hugh F. Durrant-Whyte,et al.  Contextual occupancy maps using Gaussian processes , 2009, 2009 IEEE International Conference on Robotics and Automation.

[3]  Marc Toussaint,et al.  Gaussian process implicit surfaces for shape estimation and grasping , 2011, 2011 IEEE International Conference on Robotics and Automation.

[4]  Sven Behnke,et al.  Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition , 2010, ICANN.

[5]  Jianxiong Xiao,et al.  Sliding Shapes for 3D Object Detection in Depth Images , 2014, ECCV.

[6]  Charless C. Fowlkes,et al.  Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation , 2016, ECCV.

[7]  Fabio Tozeto Ramos,et al.  Hilbert maps: scalable continuous occupancy mapping with stochastic gradient descent , 2015, Robotics: Science and Systems.

[8]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[9]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[10]  Joachim Hertzberg,et al.  6D SLAM—3D mapping outdoor environments , 2007, J. Field Robotics.

[11]  Jürgen Schmidhuber,et al.  Feature Extraction Through LOCOCODE , 1999, Neural Computation.

[12]  Dieter Fox,et al.  Unsupervised feature learning for 3D scene labeling , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Sergei Vassilvitskii,et al.  Scalable K-Means++ , 2012, Proc. VLDB Endow..

[14]  Gernot Riegler,et al.  OctNet: Learning Deep 3D Representations at High Resolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Fabio Tozeto Ramos,et al.  Large-scale 3D scene reconstruction with Hilbert Maps , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16]  Andrew W. Moore,et al.  Logistic regression for data mining and high-dimensional classification , 2004 .

[17]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[18]  Bernhard Schölkopf,et al.  Computing functions of random variables via reproducing kernel Hilbert space representations , 2015, Statistics and Computing.

[19]  Jürgen Schmidhuber,et al.  Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction , 2011, ICANN.

[20]  Carl Doersch,et al.  Tutorial on Variational Autoencoders , 2016, ArXiv.

[21]  Fabio Tozeto Ramos,et al.  Unsupervised Feature Learning for 3D Scene Reconstruction with Occupancy Maps , 2017, AAAI.

[22]  Wolfram Burgard,et al.  Unsupervised learning of compact 3D models based on the detection of recurrent structures , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[23]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Alberto Elfes,et al.  Occupancy grids: a probabilistic framework for robot perception and navigation , 1989 .

[26]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[28]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[29]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[30]  Wolfram Burgard,et al.  OctoMap: an efficient probabilistic 3D mapping framework based on octrees , 2013, Autonomous Robots.

[31]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[32]  Luca Maria Gambardella,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Flexible, High Performance Convolutional Neural Networks for Image Classification , 2022 .

[33]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[34]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).