Free your Camera: 3D Indoor Scene Understanding from Arbitrary Camera Motion

Many works have been presented for indoor scene understanding, yet few of them combine structural reasoning with full motion estimation in a real-time oriented approach. In this work we address the problem of estimating the 3D structural layout of complex and cluttered indoor scenes from monocular video sequences, where the observer can freely move in the surrounding space. We propose an effective probabilistic formulation that allows us to generate, evaluate and optimize layout hypotheses by integrating new image evidence as the observer moves. Compared to state-of-the-art work, our approach makes significantly less limiting hypotheses about the scene and the observer (e.g., Manhattan world assumption, known camera motion). We introduce a new challenging dataset and present an extensive experimental evaluation, which demonstrates that our formulation reaches near-real-time computation time and outperforms state-of-the-art methods while operating in significantly less constrained conditions.

[1]  J. Canny A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Andrew W. Fitzgibbon,et al.  Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.

[3]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[4]  Andrew J. Davison,et al.  Real-time simultaneous localisation and mapping with a single camera , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  Wolfram Burgard,et al.  Probabilistic Robotics (Intelligent Robotics and Autonomous Agents) , 2005 .

[6]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[7]  Luc Van Gool,et al.  3D Urban Scene Modeling Integrating Recognition and Reconstruction , 2008, International Journal of Computer Vision.

[8]  Tom Drummond,et al.  Monocular SLAM as a Graph of Coalesced Observations , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[9]  Jan-Michael Frahm,et al.  Detailed Real-Time Urban 3D Reconstruction from Video , 2007, International Journal of Computer Vision.

[10]  Walterio W. Mayol-Cuevas,et al.  Discovering Higher Level Structure in Visual SLAM , 2008, IEEE Transactions on Robotics.

[11]  David W. Murray,et al.  Improving the Agility of Keyframe-Based SLAM , 2008, ECCV.

[12]  Juan D. Tardós,et al.  Large-Scale SLAM Building Conditionally Independent Local Maps: Application to Monocular Vision , 2008, IEEE Transactions on Robotics.

[13]  T. Kanade,et al.  Geometric reasoning for single image structure recovery , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Matteo Matteucci,et al.  On the use of inverse scaling in monocular SLAM , 2009, 2009 IEEE International Conference on Robotics and Automation.

[15]  Derek Hoiem,et al.  Recovering the spatial layout of cluttered rooms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Takeo Kanade,et al.  Geometric reasoning for single image structure recovery , 2009, CVPR.

[17]  Richard Szeliski,et al.  Reconstructing building interiors from images , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Richard Szeliski,et al.  Piecewise planar stereo for image-based rendering , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  Richard Szeliski,et al.  Manhattan-world stereo , 2009, CVPR.

[20]  Joan Solà,et al.  Consistency of the monocular EKF-SLAM algorithm for three different landmark parametrizations , 2010, 2010 IEEE International Conference on Robotics and Automation.

[21]  Richard Szeliski,et al.  Reconstructing Rome , 2010, Computer.

[22]  Takeo Kanade,et al.  Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces , 2010, NIPS.

[23]  David A. Forsyth,et al.  Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry , 2010, ECCV.

[24]  Ian D. Reid,et al.  A Dynamic Programming Approach to Reconstructing Building Interiors , 2010, ECCV.

[25]  Ian D. Reid,et al.  Growing semantically meaningful models for visual SLAM , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Changhai Xu,et al.  Real-time indoor scene understanding using Bayesian filtering with motion cues , 2011, 2011 International Conference on Computer Vision.

[27]  Steven M. Seitz,et al.  Multicore bundle adjustment , 2011, CVPR 2011.

[28]  Ian D. Reid,et al.  Manhattan scene understanding using monocular, stereo, and 3D features , 2011, 2011 International Conference on Computer Vision.

[29]  Noah Snavely,et al.  Scene Reconstruction and Visualization from Internet Photo Collections: A Survey , 2011, IPSJ Trans. Comput. Vis. Appl..

[30]  Nicolas Mansard,et al.  RT-SLAM: A Generic and Real-Time Visual SLAM Implementation , 2011, ICVS.

[31]  Martial Hebert,et al.  Data-Driven Scene Understanding from 3D Models , 2012, BMVC.

[32]  Raquel Urtasun,et al.  Efficient Exact Inference for 3D Indoor Scene Understanding , 2012, ECCV.

[33]  Benjamin Kuipers,et al.  Dynamic visual understanding of the local environment for an indoor navigating robot , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[34]  Marc Pollefeys,et al.  Efficient structured prediction for 3D indoor scene understanding , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Jianxiong Xiao,et al.  Reconstructing the World’s Museums , 2014, International Journal of Computer Vision.