论文信息 - Simplifying Indoor Scenes for Real-Time Manipulation on Mobile Devices

Simplifying Indoor Scenes for Real-Time Manipulation on Mobile Devices

Having precise measurements of an indoor scene is important for several applications - e.g.augmented reality furniture placement - whereas geometric details are only needed up to a certain scale. Depth sensors provide a highly detailed reconstruction but mobile phones are not able to display and manipulate these models in real-time due to the massive amount of data and the lack of computational power. This paper therefore aims to close this gap and provides a simplification of indoor scenes. RGB-D input sequences are exploited to extract wall segments and object candidates. For each input frame, walls, ground plane and ceiling are estimated by plane segments, object candidates are detected using a state-of-the-art object detector. The objects' correct poses and semantic types are gathered by exploiting a 3D CAD dataset and by introducing a Markov Random Field over time. A vast variety of experiments outline the practicability and low memory consumption of the resulting models on mobile phones and demonstrate the ability of preserving precise 3D measurements based on a variety of real indoor scenes.

Martin Kampel | Patrick Wolf | Michael Hödlmoser

[1] Anthony Cowley,et al. Parsing Indoor Scenes Using RGB-D Imagery , 2012, Robotics: Science and Systems.

[2] Silvio Savarese,et al. Understanding Indoor Scenes Using 3D Geometric Phrases , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3] Andrew W. Fitzgibbon,et al. KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[4] Michael Garland,et al. Surface simplification using quadric error metrics , 1997, SIGGRAPH.

[5] Gary R. Bradski,et al. Fast 3D recognition and pose using the Viewpoint Feature Histogram , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6] Marc Pollefeys,et al. Efficient structured prediction for 3D indoor scene understanding , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Alexei A. Efros,et al. Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[8] Alexei A. Efros,et al. Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[9] Daniel P. Huttenlocher,et al. Efficient Graph-Based Image Segmentation , 2004, International Journal of Computer Vision.

[10] Ali Shahrokni,et al. Mesh Based Semantic Modelling for Indoor and Outdoor Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11] Derek Hoiem,et al. Recovering the spatial layout of cluttered rooms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12] András Bódis-Szomorú,et al. Fast, Approximate Piecewise-Planar Modeling Based on Sparse Structure-from-Motion and Superpixels , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13] Daniel Fried,et al. Bayesian geometric modeling of indoor scenes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14] David G. Kirkpatrick,et al. On the shape of a set of points in the plane , 1983, IEEE Trans. Inf. Theory.

[15] Martin Kampel,et al. Sparse Point Cloud Densification by Combining Multiple Segmentation Methods , 2013, 2013 International Conference on 3D Vision.

[16] Jianxiong Xiao,et al. Sliding Shapes for 3D Object Detection in Depth Images , 2014, ECCV.

[17] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18] Alexei A. Efros,et al. Seeing 3D Chairs: Exemplar Part-Based 2D-3D Alignment Using a Large Dataset of CAD Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Jian Zhang,et al. Estimating the 3D Layout of Indoor Scenes and Its Clutter from Depth Sensors , 2013, 2013 IEEE International Conference on Computer Vision.

[20] Bastian Leibe,et al. Dense 3D semantic mapping of indoor scenes from RGB-D images , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).