Depth from Familiar Objects: A Hierarchical Model for 3D Scenes

We develop an integrated, probabilistic model for the appearance and three-dimensional geometry of cluttered scenes. Object categories are modeled via distributions over the 3D location and appearance of visual features. Uncertainty in the number of object instances depicted in a particular image is then achieved via a transformed Dirichlet process. In contrast with image-based approaches to object recognition, we model scale variations as the perspective projection of objects in different 3D poses. To calibrate the underlying geometry, we incorporate binocular stereo images into the training process. A robust likelihood model accounts for outliers in matched stereo features, allowing effective learning of 3D object structure from partial 2D segmentations. Applied to a dataset of office scenes, our model detects objects at multiple scales via a coarse reconstruction of the corresponding 3D geometry.

[1]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[2]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[3]  M. Escobar,et al.  Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[4]  Brendan J. Frey,et al.  Learning flexible sprites in video layers , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[5]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[6]  Antonio Torralba,et al.  Depth Estimation from Image Structure , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[8]  Zhuowen Tu,et al.  Image Parsing: Unifying Segmentation, Detection, and Recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[10]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[11]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[12]  Cordelia Schmid,et al.  A maximum entropy framework for part-based texture and object recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  Antonio Torralba,et al.  Describing Visual Scenes using Transformed Dirichlet Processes , 2005, NIPS.

[14]  Peter Meer,et al.  Simultaneous multiple 3D motion estimation via mode finding on Lie groups , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[15]  Antonio Torralba,et al.  Learning hierarchical models of scenes, objects, and parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[16]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17]  Alexei A. Efros,et al.  Geometric context from a single image , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[18]  Ashutosh Saxena,et al.  Learning Depth from Single Monocular Images , 2005, NIPS.

[19]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .