3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model

This paper addresses the problem of category-level 3D object detection. Given a monocular image, our aim is to localize the objects in 3D by enclosing them with tight oriented 3D bounding boxes. We propose a novel approach that extends the well-acclaimed deformable part-based model [1] to reason in 3D. Our model represents an object class as a deformable 3D cuboid composed of faces and parts, which are both allowed to deform with respect to their anchors on the 3D box. We model the appearance of each face in fronto-parallel coordinates, thus effectively factoring out the appearance variation induced by viewpoint. Our model reasons about face visibility patters called aspects. We train the cuboid model jointly and discriminatively and share weights across all aspects to attain efficiency. Inference then entails sliding and rotating the box in 3D and scoring object hypotheses. While for inference we discretize the search space, the variables are continuous in our model. We demonstrate the effectiveness of our approach in indoor and outdoor scenarios, and show that our approach significantly outperforms the state-of-the-art in both 2D [1] and 3D object detection [2].

[1]  Rodney A. Brooks,et al.  Model-Based Three-Dimensional Interpretations of Two-Dimensional Images , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  David G. Lowe,et al.  Fitting Parameterized Three-Dimensional Models to Images , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Azriel Rosenfeld,et al.  3-D Shape Recovery Using Distributed Aspect Matching , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Takeo Kanade,et al.  A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[5]  J. Koenderink,et al.  The singularities of the visual mapping , 1976, Biological Cybernetics.

[6]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Luc Van Gool,et al.  Towards Multi-View Object Class Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Silvio Savarese,et al.  3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[9]  Mubarak Shah,et al.  3D Model based Object Class Detection in An Arbitrary View , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[10]  Antonio Torralba,et al.  Sharing Visual Features for Multiclass and Multiview Object Detection , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Derek Hoiem,et al.  3D LayoutCRF for Multi-View Object Class Recognition and Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Cordelia Schmid,et al.  Flexible Object Models for Category-Level 3D Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Justus H. Piater,et al.  Probabilistic Pose Recovery Using Learned Hierarchical Object Models , 2009, ICVW.

[14]  Cordelia Schmid,et al.  Viewpoint-independent object class detection using 3D Feature Maps , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[16]  Silvio Savarese,et al.  A multi-view probabilistic model for 3D object classes , 2009, CVPR.

[17]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Silvio Savarese,et al.  Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19]  A multi-view probabilistic model for 3D object classes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Vincent Lepetit,et al.  Dominant orientation templates for real-time detection of texture-less objects , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Michael Goesele,et al.  Back to the Future: Learning Shape Models from 3D CAD Data , 2010, BMVC.

[22]  Xiaofeng Ren,et al.  Discriminative Mixture-of-Templates for Viewpoint Classification , 2010, ECCV.

[23]  Silvio Savarese,et al.  Depth-Encoded Hough Voting for Joint Object Detection and Shape Recovery , 2010, ECCV.

[24]  David A. Forsyth,et al.  Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry , 2010, ECCV.

[25]  Ronen Basri,et al.  Viewpoint-aware object detection and pose estimation , 2011, 2011 International Conference on Computer Vision.

[26]  Silvio Savarese,et al.  Estimating the aspect layout of object categories , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Marc Pollefeys,et al.  Efficient structured prediction for 3D indoor scene understanding , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Peter V. Gehler,et al.  Teaching 3D geometry to deformable part models , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.