A multi-view probabilistic model for 3D object classes

We propose a novel probabilistic framework for learning visual models of 3D object categories by combining appearance information and geometric constraints. Objects are represented as a coherent ensemble of parts that are consistent under 3D viewpoint transformations. Each part is a collection of salient image features. A generative framework is used for learning a model that captures the relative position of parts within each of the discretized viewpoints. Contrary to most of the existing mixture of viewpoints models, our model establishes explicit correspondences of parts across different viewpoints of the object class. Given a new image, detection and classification are achieved by determining the position and viewpoint of the model that maximize recognition scores of the candidate objects. Our approach is among the first to propose a generative probabilistic framework for 3D object categorization. We test our algorithm on the detection task and the viewpoint classification task by using “car” category from both the Savarese et al. 2007 and PASCAL VOC 2006 datasets. We show promising results in both the detection and viewpoint classification tasks on these two challenging datasets.

[1]  Cordelia Schmid,et al.  An Affine Invariant Interest Point Detector , 2002, ECCV.

[2]  Luc Van Gool,et al.  Towards Multi-View Object Class Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Luc Van Gool,et al.  Simultaneous Object Recognition and Segmentation from Single or Multiple Model Views , 2006, International Journal of Computer Vision.

[4]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[5]  Jitendra Malik,et al.  Shape matching and object recognition using low distortion correspondences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6]  J. Koenderink,et al.  The singularities of the visual mapping , 1976, Biological Cybernetics.

[7]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[8]  Deng Cai,et al.  Topic modeling with network regularization , 2008, WWW.

[9]  Pietro Perona,et al.  Viewpoint-invariant learning and detection of human heads , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[10]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[11]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[12]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[13]  Pietro Perona,et al.  Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[14]  Takeo Kanade,et al.  A statistical approach to 3d object detection applied to faces and cars , 2000 .

[15]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, CVPR 2004.

[16]  Antonio Torralba,et al.  Describing Visual Scenes using Transformed Dirichlet Processes , 2005, NIPS.

[17]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[18]  Michael I. Jordan,et al.  Variational methods for the Dirichlet process , 2004, ICML.

[19]  Cordelia Schmid,et al.  Flexible Object Models for Category-Level 3D Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Derek Hoiem,et al.  3D LayoutCRF for Multi-View Object Class Recognition and Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Mubarak Shah,et al.  3D Model based Object Class Detection in An Arbitrary View , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[22]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Matthew A. Brown,et al.  Unsupervised 3D object recognition and reconstruction in unordered datasets , 2005, Fifth International Conference on 3-D Digital Imaging and Modeling (3DIM'05).

[24]  B. S. Manjunath,et al.  The multiRANSAC algorithm and its application to detect planar homographies , 2005, IEEE International Conference on Image Processing 2005.

[25]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[26]  Leslie Pack Kaelbling,et al.  Virtual Training for Multi-View Object Class Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Yee Whye Teh,et al.  Collapsed Variational Dirichlet Process Mixture Models , 2007, IJCAI.

[28]  Cordelia Schmid,et al.  3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints , 2006, International Journal of Computer Vision.

[29]  Ronen Basri,et al.  Recognition by Linear Combinations of Models , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Silvio Savarese,et al.  3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[31]  Cordelia Schmid,et al.  Viewpoint-independent object class detection using 3D Feature Maps , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Silvio Savarese,et al.  View Synthesis for Recognizing Unseen Poses of Object Classes , 2008, ECCV.

[33]  Peter Rockett,et al.  The Accuracy of Sub-Pixel Localisation in the Canny Edge Detector , 1999, BMVC.