A hypothesize-and-bound algorithm for simultaneous object classification, pose estimation and 3D reconstruction from a single 2D image

We consider the problems of 3D reconstruction, pose estimation and object classification, from a single 2D image. In sharp contrast with state of the art methods that solve each of these problems separately or iteratively, we propose a mathematical framework that solves these problems jointly and simultaneously. Since the joint problem is ill posed unless “prior knowledge” is considered, the proposed framework incorporates “prior knowledge” about the 3D shapes of different object classes. This knowledge is used to define a function L(H) that encodes how well each hypothesis H (object class and pose) “explains” the input image. To efficiently maximize L(H) without having to exactly evaluate it for each hypothesis H, we propose a H&B algorithm that computes and refines upper and lower bounds for L(H) at a much lower cost. In this way suboptimal hypotheses are disregarded with little computation. The resulting algorithm integrates information from the 2D image and the 3D prior, is efficient, and is guaranteed to find the optimal solution.

[1]  Feng Han,et al.  Bayesian reconstruction of 3D shapes and scenes from a single image , 2003, First IEEE International Workshop on Higher-Level Knowledge in 3D Modeling and Motion Analysis, 2003. HLK 2003..

[2]  Alexei A. Efros,et al.  Automatic photo pop-up , 2005, SIGGRAPH 2005.

[3]  Guillermo Sapiro,et al.  Seeing 3D objects in a single 2D image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[5]  Sameer Singh,et al.  Video analysis of human dynamics - a survey , 2003, Real Time Imaging.

[6]  Michael J. Black,et al.  Predicting 3D People from 2D Pictures , 2006, AMDO.

[7]  Anthony J. Yezzi,et al.  Non-rigid 2D-3D pose estimation and 2D image segmentation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Silvio Savarese,et al.  3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[9]  Nikos Paragios,et al.  Motion-based background subtraction using adaptive kernel density estimation , 2004, CVPR 2004.

[10]  Guillermo Sapiro,et al.  Seeing 3 D Objects in a Single 2 D Image , 2010 .

[11]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[12]  Michael J. Black,et al.  A Framework for Robust Subspace Learning , 2003, International Journal of Computer Vision.

[13]  René Vidal,et al.  Hypothesize and Bound: A Computational Focus of Attention Mechanism for Simultaneous 3D Shape Reconstruction, Pose Estimation and Classification from a Single 2D Image , 2011, ArXiv.

[14]  Diego Rother,et al.  Hypothesize and Bound: A Computational Focus of Attention Mechanism for Simultaneous N-D Segmentation, Pose Estimation and Classification Using Shape Priors , 2011, 1104.2580.

[15]  Lawrence G. Roberts,et al.  Machine Perception of Three-Dimensional Solids , 1963, Outstanding Dissertations in the Computer Sciences.