Pose estimation for category specific multiview object localization

We propose an approach to overcome the two main challenges of 3D multiview object detection and localization: The variation of object features due to changes in the viewpoint and the variation in the size and aspect ratio of the object. Our approach proceeds in three steps. Given an initial bounding box of fixed size, we first refine its aspect ratio and size. We can then predict the viewing angle, under the hypothesis that the bounding box actually contains an object instance. Finally, a classifier tuned to this particular viewpoint checks the existence of an instance. As a result, we can find the object instances and estimate their poses, without having to search over all window sizes and potential orientations. We train and evaluate our method on a new object database specifically tailored for this task, containing real-world objects imaged over a wide range of smoothly varying viewpoints and significant lighting changes. We show that the successive estimations of the bounding box and the viewpoint lead to better localization results.

[1]  Vincent Lepetit,et al.  A fast local descriptor for dense matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Larry S. Davis,et al.  Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers , 2008, ECCV.

[3]  Serge J. Belongie,et al.  Object categorization using co-occurrence, location and appearance , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Andrew Zisserman,et al.  An Exemplar Model for Learning Object Classes , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Christoph H. Lampert,et al.  Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[6]  David G. Lowe,et al.  Multiclass Object Recognition with Sparse, Localized Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  D. Geman,et al.  Stationary Features and Cat Detection , 2008 .

[9]  Silvio Savarese,et al.  3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[10]  Dorin Comaniciu,et al.  Joint Real-time Object Detection and Pose Estimation Using Probabilistic Boosting Network , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Daphne Koller,et al.  Learning Spatial Context: Using Stuff to Find Things , 2008, ECCV.

[13]  Stefano Soatto,et al.  Localizing Objects with Smart Dictionaries , 2008, ECCV.

[14]  Stan Sclaroff,et al.  Multiplicative kernels: Object detection, segmentation and pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Yali Amit,et al.  POP: Patchwork of Parts Models for Object Recognition , 2007, International Journal of Computer Vision.

[17]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[18]  Luc Van Gool,et al.  Towards Multi-View Object Class Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[19]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.