Box in the Box: Joint 3D Layout and Object Reasoning from Single Images

In this paper we propose an approach to jointly infer the room layout as well as the objects present in the scene. Towards this goal, we propose a branch and bound algorithm which is guaranteed to retrieve the global optimum of the joint problem. The main difficulty resides in taking into account occlusion in order to not over-count the evidence. We introduce a new decomposition method, which generalizes integral geometry to triangular shapes, and allows us to bound the different terms in constant time. We exploit both geometric cues and object detectors as image features and show large improvements in 2D and 3D object detection over state-of-the-art deformable part-based models.

[1]  Rodney A. Brooks,et al.  Model-Based Three-Dimensional Interpretations of Two-Dimensional Images , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Azriel Rosenfeld,et al.  3-D Shape Recovery Using Distributed Aspect Matching , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Takeo Kanade,et al.  A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[4]  Luc Van Gool,et al.  Towards Multi-View Object Class Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, CVPR.

[6]  Ashutosh Saxena,et al.  3-D Depth Reconstruction from a Single Still Image , 2007, International Journal of Computer Vision.

[7]  Antonio Torralba,et al.  Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Derek Hoiem,et al.  3D LayoutCRF for Multi-View Object Class Recognition and Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Cordelia Schmid,et al.  Flexible Object Models for Category-Level 3D Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Derek Hoiem,et al.  Recovering the spatial layout of cluttered rooms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Christoph H. Lampert,et al.  Efficient Subwindow Search: A Branch and Bound Framework for Object Localization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Takeo Kanade,et al.  Geometric reasoning for single image structure recovery , 2009, CVPR.

[13]  Silvio Savarese,et al.  A multi-view probabilistic model for 3D object classes , 2009, CVPR.

[14]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Xiaofeng Ren,et al.  Discriminative Mixture-of-Templates for Viewpoint Classification , 2010, ECCV.

[16]  Takeo Kanade,et al.  Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces , 2010, NIPS.

[17]  Silvio Savarese,et al.  Depth-Encoded Hough Voting for Joint Object Detection and Shape Recovery , 2010, ECCV.

[18]  David A. Forsyth,et al.  Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry , 2010, ECCV.

[19]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[20]  Stephen Gould,et al.  Discriminative Learning with Latent Variables for Cluttered Indoor Scene Understanding , 2010, ECCV.

[21]  Alexei A. Efros,et al.  From 3D scene geometry to human workspace , 2011, CVPR 2011.

[22]  KU AlexanderThomas Towards MultiView Object Class Detection , 2011 .

[23]  Ronen Basri,et al.  Viewpoint-aware object detection and pose estimation , 2011, 2011 International Conference on Computer Vision.

[24]  Andreas Geiger,et al.  Joint 3D Estimation of Objects and Scene Layout , 2011, NIPS.

[25]  Silvio Savarese,et al.  Semantic structure from motion , 2011, CVPR 2011.

[26]  Raquel Urtasun,et al.  Efficient Exact Inference for 3D Indoor Scene Understanding , 2012, ECCV.

[27]  Silvio Savarese,et al.  Estimating the aspect layout of object categories , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Sven J. Dickinson,et al.  3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model , 2012, NIPS.

[29]  David A. Forsyth,et al.  Recovering free space of indoor scenes from a single image , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Marc Pollefeys,et al.  Efficient structured prediction for 3D indoor scene understanding , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Daniel Fried,et al.  Bayesian geometric modeling of indoor scenes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Alexei A. Efros,et al.  Scene Semantics from Long-Term Observation of People , 2012, ECCV.

[33]  Peter V. Gehler,et al.  Teaching 3D geometry to deformable part models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.