Lifting Object Detection Datasets into 3D

While data has certainly taken the center stage in computer vision in recent years, it can still be difficult to obtain in certain scenarios. In particular, acquiring ground truth 3D shapes of objects pictured in 2D images remains a challenging feat and this has hampered progress in recognition-based object reconstruction from a single image. Here we propose to bypass previous solutions such as 3D scanning or manual design, that scale poorly, and instead populate object category detection datasets semi-automatically with dense, per-object 3D reconstructions, bootstrapped from:(i) class labels, (ii) ground truth figure-ground segmentations and (iii) a small set of keypoint annotations. Our proposed algorithm first estimates camera viewpoint using rigid structure-from-motion and then reconstructs object shapes by optimizing over visual hull proposals guided by loose within-class shape similarity assumptions. The visual hull sampling process attempts to intersect an object's projection cone with the cones of minimal subsets of other similar objects among those pictured from certain vantage points. We show that our method is able to produce convincing per-object 3D reconstructions and to accurately estimate cameras viewpoints on one of the most challenging existing object-category detection datasets, PASCAL VOC. We hope that our results will re-stimulate interest on joint object recognition and 3D reconstruction from a single image.

[1]  Touradj Ebrahimi,et al.  MESH: measuring errors between surfaces using the Hausdorff distance , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[2]  Cristian Sminchisescu,et al.  Semantic Segmentation with Second-Order Pooling , 2012, ECCV.

[3]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[4]  Joseph L. Mundy,et al.  Object Recognition in the Geometric Era: A Retrospective , 2006, Toward Category-Level Object Recognition.

[5]  J J Koenderink,et al.  Affine structure from motion. , 1991, Journal of the Optical Society of America. A, Optics and image science.

[6]  S. Ullman The interpretation of structure from motion , 1979, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[7]  Antonio Torralba,et al.  Parsing IKEA Objects: Fine Pose Estimation , 2013, 2013 IEEE International Conference on Computer Vision.

[8]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Daniel Cremers,et al.  Multiview Stereo and Silhouette Consistency via Convex Functionals over Convex Domains , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Rui Yu,et al.  Video Pop-up: Monocular 3D Reconstruction of Dynamic Scenes , 2014, ECCV.

[11]  A. Laurentini,et al.  The Visual Hull Concept for Silhouette-Based Image Understanding , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  S. P. Mudur,et al.  Three-dimensional computer vision: a geometric viewpoint , 1993 .

[13]  Daniel Cremers,et al.  Generalized Connectivity Constraints for Spatio-temporal 3D Reconstruction , 2014, ECCV.

[14]  Niloy J. Mitra,et al.  Shadow art , 2009, ACM Trans. Graph..

[15]  Ronen Basri,et al.  Example Based 3D Reconstruction from Single 2D Images , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[16]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[17]  Jonathan T. Barron,et al.  Boundary Cues for 3D Object Shape Recovery , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Daniel Cremers,et al.  Relative Volume Constraints for Single View 3D Reconstruction , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Lourdes Agapito,et al.  Balloon Shapes: Reconstructing and Deforming Objects with Volume from Images , 2013, 2013 International Conference on 3D Vision.

[20]  Andrew Blake,et al.  Efficient Human Pose Estimation from Single Depth Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Lourdes Agapito,et al.  Factorization for non-rigid and articulated structure using metric projections , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Subhransu Maji,et al.  Object segmentation by alignment of poselet activations to image contours , 2011, CVPR 2011.

[23]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[24]  Sebastian Thrun,et al.  SCAPE: shape completion and animation of people , 2005, SIGGRAPH '05.

[25]  Ian D. Reid,et al.  Dense Reconstruction Using 3D Object Shape Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Marc Pollefeys,et al.  Pulling Things out of Perspective , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Ramakant Nevatia,et al.  Using Perceptual Organization to Extract 3-D Structures , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Silvio Savarese,et al.  Beyond PASCAL: A benchmark for 3D object detection in the wild , 2014, IEEE Winter Conference on Applications of Computer Vision.

[29]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[30]  Henning Biermann,et al.  Recovering non-rigid 3D shape from image streams , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[31]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[32]  David G. Lowe,et al.  Three-Dimensional Object Recognition from Single Two-Dimensional Images , 1987, Artif. Intell..

[33]  Lourdes Agapito,et al.  Reconstructing PASCAL VOC , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Silvio Savarese,et al.  Dense Object Reconstruction with Semantic Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Lawrence G. Roberts,et al.  Machine Perception of Three-Dimensional Solids , 1963, Outstanding Dissertations in the Computer Sciences.

[36]  Cristian Sminchisescu,et al.  Object Recognition by Sequential Figure-Ground Ranking , 2011, International Journal of Computer Vision.

[37]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[38]  Silvio Savarese,et al.  Representations and Techniques for 3D Object Recognition and Scene Interpretation , 2011, Representations and Techniques for 3D Object Recognition and Scene Interpretation.

[39]  João Paulo Costeira,et al.  Estimating 3D shape from degenerate sequences with missing data , 2009, Comput. Vis. Image Underst..

[40]  Li Zhang,et al.  Model evolution: An incremental approach to non-rigid structure from motion , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[41]  Silvio Savarese,et al.  A multi-view probabilistic model for 3D object classes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[43]  D. Marr,et al.  Representation and recognition of the spatial organization of three-dimensional shapes , 1978, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[44]  Andrew W. Fitzgibbon,et al.  Finding nemo: Deformable object class modelling using curve matching , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[45]  Alexei A. Efros,et al.  Seeing 3D Chairs: Exemplar Part-Based 2D-3D Alignment Using a Large Dataset of CAD Models , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Thomas O. Binford,et al.  Computer Description of Curved Objects , 1973, IEEE Transactions on Computers.

[47]  Stephen DiVerdi,et al.  Learning part-based templates from large collections of 3D shapes , 2013, ACM Trans. Graph..

[48]  Antonio Torralba,et al.  Building a database of 3D scenes from user annotations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[49]  Leonidas J. Guibas,et al.  Estimating image depth using shape collections , 2014, ACM Trans. Graph..

[50]  Hiroshi Murase,et al.  Visual learning and recognition of 3-d objects from appearance , 2005, International Journal of Computer Vision.

[51]  Trevor Darrell,et al.  Inferring 3D structure with a statistical image-based shape model , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[52]  Matthew Turk,et al.  A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[53]  Kristen Grauman,et al.  Shape Sharing for Object Segmentation , 2012, ECCV.

[54]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[55]  K NayarShree,et al.  Visual learning and recognition of 3-D objects from appearance , 1995 .

[56]  Andrew W. Fitzgibbon,et al.  What Shape Are Dolphins? Building 3D Morphable Models from 2D Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Alexei A. Efros,et al.  Geometric context from a single image , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[58]  Ashutosh Saxena,et al.  3-D Depth Reconstruction from a Single Still Image , 2007, International Journal of Computer Vision.

[59]  Edward H. Adelson,et al.  Playing with Puffball: simple scale-invariant inflation for use in vision and graphics , 2012, SAP '12.

[60]  William Grimson,et al.  Object recognition by computer - the role of geometric constraints , 1991 .

[61]  Vladimir Kolmogorov,et al.  Graph cut based image segmentation with connectivity priors , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Berthold K. P. Horn SHAPE FROM SHADING: A METHOD FOR OBTAINING THE SHAPE OF A SMOOTH OPAQUE OBJECT FROM ONE VIEW , 1970 .

[63]  Andrew W. Fitzgibbon,et al.  Single View Reconstruction of Curved Surfaces , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[64]  Bernt Schiele,et al.  Detailed 3D Representations for Object Recognition and Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[65]  Jitendra Malik,et al.  Shape, albedo, and illumination from a single image of an unknown object , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.