3D Structure from Detections

We present a novel method to infer, in closed-form, the 3D spatial occupancy of a collection of rigid objects given 2D image detections from a sequence of images. In particular, starting from 2D ellipses fitted to bounding boxes, this novel multi-view problem can be reformulated as the estimation of a quadric (ellipsoid) in 3D. We show that an efficient solution exists in the dual-space using a minimum of three views. This algebraic solution can be negatively affected in the presence of gross inaccuracies in the bounding boxes estimation. To this end, we also propose a robust regularization method and a robust ellipse fitting algorithm able to improve performance in the presence of errors in the detected objects. Results on synthetic tests and on different real datasets, involving real challenging scenarios, demonstrate the applicability and potential of our method.

[1]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[2]  Silvio Savarese,et al.  Multiple Target Tracking in World Coordinate with Single, Minimally Calibrated Camera , 2010, ECCV.

[3]  Silvio Savarese,et al.  Semantic structure from motion , 2011, CVPR 2011.

[4]  Reinhard Koch,et al.  Self-Calibration and Metric Reconstruction Inspite of Varying and Unknown Intrinsic Camera Parameters , 1999, International Journal of Computer Vision.

[5]  Konrad Schindler,et al.  Are Cars Just 3D Boxes? Jointly Estimating the 3D Shape of Multiple Objects , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Luigi di Stefano,et al.  Joint Detection, Tracking and Mapping by Semantic Bundle Adjustment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Vincent Lepetit,et al.  Model Based Training, Detection and Pose Estimation of Texture-Less 3D Objects in Heavily Cluttered Scenes , 2012, ACCV.

[9]  Markus Vincze,et al.  Automation of “ground truth” annotation for multi-view RGB-D object instance recognition datasets , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Andrew Zisserman,et al.  Quadric reconstruction from dual-space geometry , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[11]  S. R. Searle,et al.  Vec and vech operators for matrices, with some uses in jacobians and multivariate statistics , 1979 .

[12]  Alexei A. Efros,et al.  How Important Are "Deformable Parts" in the Deformable Parts Model? , 2012, ECCV Workshops.

[13]  Paul H. J. Kelly,et al.  SLAM++: Simultaneous Localisation and Mapping at the Level of Objects , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[15]  Reinhard Koch,et al.  Self-calibration and metric reconstruction in spite of varying and unknown internal camera parameters , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[16]  Peter F. Sturm,et al.  A Factorization Based Algorithm for Multi-Image Projective Structure and Motion , 1996, ECCV.

[17]  Brendan J. Frey,et al.  Stel component analysis: Modeling spatial correlations in image class structure , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Bernt Schiele,et al.  Detailed 3D Representations for Object Recognition and Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[20]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[21]  Silvio Savarese,et al.  Estimating the aspect layout of object categories , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Jonathon Shlens,et al.  Fast, Accurate Detection of 100,000 Object Classes on a Single Machine , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Lourdes Agapito,et al.  Reconstructing PASCAL VOC , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Deva Ramanan,et al.  Analysis by Synthesis: 3D Object Recognition by Object Reconstruction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.