Real-time indoor scene reconstruction with Manhattan assumption

This paper presents a novel end-to-end system for real-time indoor scene reconstruction, which outperforms traditional image feature point-based method and dense geometry correspondence-based method in handling indoor scenes with less texture and geometry features. In our method, we fully explore the Manhattan assumption, i.e. scenes are majorly consisted with planar surfaces with orthogonal normal directions. Given an input depth frame, we first extract dominant axes coordinates via principle component analysis which involves the orthogonal prior and reduce the influence of noise. Then we calculate the coordinates of dominant planes (such as walls, floor and ceiling) in the coordinates using mean shift. Finally, we compute the camera orientation and reconstruct the scene by proposing a fast scheme based on matching the dominant axes and planes to the previous frame. We have tested our approach on several datasets and demonstrated that it outperforms some well known existing methods in these experiments. The performance of our method is also able to meet the requirement of real-time with an unoptimized CPU implementation.

[1]  W. Grimson,et al.  Model-Based Recognition and Localization from Sparse Range or Tactile Data , 1984 .

[2]  Matthias Nießner,et al.  BundleFusion , 2016, TOGS.

[3]  Chen Feng,et al.  Point-plane SLAM for hand-held 3D sensors , 2013, 2013 IEEE International Conference on Robotics and Automation.

[4]  Tom Drummond,et al.  Faster and Better: A Machine Learning Approach to Corner Detection , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Yuichi Taguchi,et al.  A Theory of Minimal 3D Point to 3D Plane Registration and Its Generalization , 2013, International Journal of Computer Vision.

[6]  Richard Szeliski,et al.  Manhattan-world stereo , 2009, CVPR.

[7]  Yongdong Zhang,et al.  Supervised Hash Coding With Deep Neural Network for Environment Perception of Intelligent Vehicles , 2018, IEEE Transactions on Intelligent Transportation Systems.

[8]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Olaf Kähler,et al.  Very High Frame Rate Volumetric Integration of Depth Images on Mobile Devices , 2015, IEEE Transactions on Visualization and Computer Graphics.

[10]  Daniel Cremers,et al.  Large-Scale Multi-resolution Surface Reconstruction from RGB-D Sequences , 2013, 2013 IEEE International Conference on Computer Vision.

[11]  Yongdong Zhang,et al.  A Highly Parallel Framework for HEVC Coding Unit Partitioning Tree Decision on Many-core Processors , 2014, IEEE Signal Processing Letters.

[12]  Vincent Lepetit,et al.  Keypoint recognition using randomized trees , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[14]  Olaf Kähler,et al.  A Framework for the Volumetric Integration of Depth Images , 2014, ArXiv.

[15]  Dieter Fox,et al.  RGB-D Mapping: Using Depth Cameras for Dense 3D Modeling of Indoor Environments , 2010, ISER.

[16]  Olivier D. Faugeras,et al.  Determining motion from 3D line segment matches: a comparative study , 1990, BMVC.

[17]  Yongdong Zhang,et al.  Effective Uyghur Language Text Detection in Complex Background Images for Traffic Prompt Identification , 2018, IEEE Transactions on Intelligent Transportation Systems.

[18]  Homer H. Chen Pose Determination from Line-to-Plane Correspondences: Existence Condition and Closed-Form Solutions , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  S. Umeyama,et al.  Least-Squares Estimation of Transformation Parameters Between Two Point Patterns , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Andrew W. Fitzgibbon,et al.  KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[21]  John J. Leonard,et al.  Robust real-time visual odometry for dense RGB-D mapping , 2013, 2013 IEEE International Conference on Robotics and Automation.

[22]  Yongdong Zhang,et al.  Parallel deblocking filter for HEVC on many-core processor , 2014 .

[23]  Pla Uni,et al.  Fast Improved Delaunay Triangulation Algorithm , 2006 .

[24]  Yongdong Zhang,et al.  Efficient Parallel Framework for HEVC Motion Estimation on Many-Core Processors , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  David Nistér,et al.  A Minimal Solution to the Generalised 3-Point Pose Problem , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[26]  Henrik I. Christensen,et al.  Planar surface SLAM with 3D and 2D sensors , 2012, 2012 IEEE International Conference on Robotics and Automation.

[27]  K. S. Arun,et al.  Least-Squares Fitting of Two 3-D Point Sets , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Se-Young Oh,et al.  Indoor mapping using planes extracted from noisy RGB-D sensors , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[29]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[30]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Andrew W. Fitzgibbon,et al.  Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.