Category-specific object reconstruction from a single image

Object reconstruction from a single image - in the wild - is a problem where we can make progress and get meaningful results today. This is the main message of this paper, which introduces an automated pipeline with pixels as inputs and 3D surfaces of various rigid categories as outputs in images of realistic scenes. At the core of our approach are deformable 3D models that can be learned from 2D annotations available in existing object detection datasets, that can be driven by noisy automatic object segmentations and which we complement with a bottom-up module for recovering high-frequency shape details. We perform a comprehensive quantitative analysis and ablation study of our approach using the recently introduced PASCAL 3D+ dataset and show very encouraging automatic reconstructions on PASCAL VOC.

[1]  Martial Hebert,et al.  3DNN: 3D Nearest Neighbor , 2014, International Journal of Computer Vision.

[2]  Edward H. Adelson,et al.  Playing with Puffball: simple scale-invariant inflation for use in vision and graphics , 2012, SAP '12.

[3]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[4]  Andrew W. Fitzgibbon,et al.  Finding nemo: Deformable object class modelling using curve matching , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  A. Torralba,et al.  How Little Do We Need for 3-D Shape Perception? , 2011, Perception.

[6]  A. Laurentini,et al.  The Visual Hull Concept for Silhouette-Based Image Understanding , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Derek Hoiem,et al.  Category Independent Object Proposals , 2010, ECCV.

[8]  Touradj Ebrahimi,et al.  MESH: measuring errors between surfaces using the Hausdorff distance , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[9]  Cristian Sminchisescu,et al.  Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[10]  Lourdes Agapito,et al.  Dense Variational Reconstruction of Non-rigid Surfaces from Monocular Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[12]  Siddhartha Chaudhuri,et al.  A probabilistic model for component-based shape synthesis , 2012, ACM Trans. Graph..

[13]  Ira Kemelmacher-Shlizerman,et al.  Total Moving Face Reconstruction , 2014, ECCV.

[14]  Lourdes Agapito,et al.  Reconstructing PASCAL VOC , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Alexei A. Efros,et al.  Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics , 2010, ECCV.

[17]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[18]  Thomas Vetter,et al.  Face Recognition Based on Fitting a 3D Morphable Model , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Ira Kemelmacher-Shlizerman,et al.  Internet Based Morphable Model , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Li Zhang,et al.  Model evolution: An incremental approach to non-rigid structure from motion , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Cristian Sminchisescu,et al.  Constrained parametric min-cuts for automatic object segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[23]  Song Wu,et al.  3 D ShapeNets : A Deep Representation for Volumetric Shape Modeling , 2015 .

[24]  Jitendra Malik,et al.  Viewpoints and keypoints , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jianxiong Xiao,et al.  Localizing 3D cuboids in single-view images , 2012, NIPS.

[26]  Yu Chen,et al.  Inferring 3D Shapes and Deformations from Single Views , 2010, ECCV.

[27]  Jitendra Malik,et al.  Shape, Illumination, and Reflectance from Shading , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Adrien Bartoli,et al.  Coarse-to-fine low-rank structure-from-motion , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Vladlen Koltun,et al.  Geodesic Object Proposals , 2014, ECCV.

[30]  Silvio Savarese,et al.  Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[31]  Henning Biermann,et al.  Recovering non-rigid 3D shape from image streams , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Aaron Hertzmann,et al.  Nonrigid Structure-from-Motion: Estimating Shape and Motion with Hierarchical Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Silvio Savarese,et al.  Representations and Techniques for 3D Object Recognition and Scene Interpretation , 2011, Representations and Techniques for 3D Object Recognition and Scene Interpretation.

[36]  Yücel Yemez,et al.  A Surface Deformation Framework for 3D Shape Recovery , 2006, MRCS.

[37]  Jitendra Malik,et al.  Color Constancy, Intrinsic Images, and Shape Estimation , 2012, ECCV.

[38]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[39]  Koen E. A. van de Sande,et al.  Segmentation as selective search for object recognition , 2011, 2011 International Conference on Computer Vision.

[40]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[41]  Francis Schmitt,et al.  Silhouette and stereo fusion for 3D object modeling , 2003, Fourth International Conference on 3-D Digital Imaging and Modeling, 2003. 3DIM 2003. Proceedings..

[42]  Lourdes Agapito,et al.  Balloon Shapes: Reconstructing and Deforming Objects with Volume from Images , 2013, 2013 International Conference on 3D Vision.

[43]  Sebastian Thrun,et al.  SCAPE: shape completion and animation of people , 2005, SIGGRAPH '05.

[44]  Antonio Torralba,et al.  Parsing IKEA Objects: Fine Pose Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[45]  Deva Ramanan,et al.  Analyzing 3D Objects in Cluttered Images , 2012, NIPS.

[46]  Ramakant Nevatia,et al.  Description and Recognition of Curved Objects , 1977, Artif. Intell..

[47]  Bernt Schiele,et al.  Detailed 3D Representations for Object Recognition and Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Francis Schmitt,et al.  Silhouette and stereo fusion for 3D object modeling , 2003, Fourth International Conference on 3-D Digital Imaging and Modeling, 2003. 3DIM 2003. Proceedings..

[49]  Andrew W. Fitzgibbon,et al.  What Shape Are Dolphins? Building 3D Morphable Models from 2D Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Subhransu Maji,et al.  Detecting People Using Mutually Consistent Poselet Activations , 2010, ECCV.

[51]  ZissermanAndrew,et al.  The Pascal Visual Object Classes Challenge , 2015 .

[52]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Lawrence G. Roberts,et al.  Machine Perception of Three-Dimensional Solids , 1963, Outstanding Dissertations in the Computer Sciences.

[54]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.