Multiview feature distributions for object detection and continuous pose estimation

Abstract This paper presents a multiview model of object categories, generally applicable to virtually any type of image features, and methods to efficiently perform, in a unified manner, detection, localization and continuous pose estimation in novel scenes. We represent appearance as distributions of low-level, fine-grained image features. Multiview models encode the appearance of objects at discrete viewpoints, and, in addition, how these viewpoints deform into one another as the viewpoint continuously varies (as detected from optical flow between training examples). Using a measure of similarity between an arbitrary test image and such a model at chosen viewpoints, we perform all tasks mentioned above with a common method. We leverage the simplicity of low-level image features, such as points extracted along edges, or coarse-scale gradients extracted densely over the images, by building probabilistic templates, i.e. distributions of features, learned from one or several training examples. We efficiently handle these distributions with probabilistic techniques such as kernel density estimation, Monte Carlo integration and importance sampling. We provide an extensive evaluation on a wide variety of benchmark datasets. We demonstrate performance on the “ETHZ Shape” dataset, with single (hand-drawn) and multiple training examples, well above baseline methods, on par with a number of more task-specific methods. We obtain remarkable performance on the recognition of more complex objects, notably the cars of the “3D Object” dataset of Savarese et al. with detection rates of 92.5 % and an accuracy in pose estimation of 91 % . We perform better than the state-of-the-art on continuous pose estimation with the “rotating cars” dataset of Ozuysal et al. We also demonstrate particular capabilities with a novel dataset featuring non-textured objects of undistinctive shapes, the pose of which can only be determined from shading, captured here by coarse scale intensity gradients.

[1]  Anurag Mittal,et al.  Multi-stage Contour Based Detection of Deformable Objects , 2008, ECCV.

[2]  Luc Van Gool,et al.  Towards Multi-View Object Class Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Silvio Savarese,et al.  Estimating the aspect layout of object categories , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Justus H. Piater,et al.  Generalized Exemplar-Based Full Pose Estimation from 2D Images without Correspondences , 2012, 2012 International Conference on Digital Image Computing Techniques and Applications (DICTA).

[5]  Emilio L. Zapata,et al.  Load Balancing versus Occupancy Maximization on Graphics Processing Units: The Generalized Hough Transform as a Case Study , 2011, Int. J. High Perform. Comput. Appl..

[6]  Cordelia Schmid,et al.  Accurate Object Detection with Deformable Shape Models Learnt from Images , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Danica Kragic,et al.  Object recognition and pose estimation for robotic manipulation using color cooccurrence histograms , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[8]  Justus H. Piater,et al.  Modeling Pose/Appearance Relations for Improved Object Localization and Pose Estimation in 2D images , 2013, IbPRIA.

[9]  Subhransu Maji,et al.  Object detection using a max-margin Hough transform , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Justus H. Piater,et al.  Continuous Pose Estimation in 2D Images at Instance and Category Levels , 2013, 2013 International Conference on Computer and Robot Vision.

[11]  Cordelia Schmid,et al.  3D Object Modeling and Recognition Using Local Affine-Invariant Image Descriptors and Multi-View Spatial Constraints , 2006, International Journal of Computer Vision.

[12]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[13]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[14]  Joseph J. Lim,et al.  Recognition using regions , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  P. Fua,et al.  Pose estimation for category specific multiview object localization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Jitendra Malik,et al.  Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  Cordelia Schmid,et al.  Bandit Algorithms for Tree Search , 2007, UAI.

[18]  Bernt Schiele,et al.  Revisiting 3D geometric models for accurate object shape and pose , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[19]  Henk Corporaal,et al.  Fast Hough Transform on GPUs: Exploration of Algorithm Trade-Offs , 2011, ACIVS.

[20]  Silvio Savarese,et al.  Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[21]  Alfred O. Hero,et al.  Robust object pose estimation via statistical manifold modeling , 2011, 2011 International Conference on Computer Vision.

[22]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[23]  Andrew Zisserman,et al.  Learning an Alphabet of Shape and Appearance for Multi-Class Object Detection , 2008, International Journal of Computer Vision.

[24]  Luc Van Gool,et al.  PRISM: PRincipled Implicit Shape Model , 2009, BMVC.

[25]  Björn Johansson,et al.  Comparison of local image descriptors for full 6 degree-of-freedom pose estimation , 2009, 2009 IEEE International Conference on Robotics and Automation.

[26]  Antonio Torralba,et al.  Sharing features: efficient boosting procedures for multiclass object detection , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[27]  Sinisa Todorovic,et al.  From contours to 3D object detection and pose estimation , 2011, 2011 International Conference on Computer Vision.

[29]  Cordelia Schmid,et al.  Multi-view object class detection with a 3D geometric model , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Takeo Kanade,et al.  A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[31]  Sudhakar Sah,et al.  A Fast and Accurate GHT Implementation on CUDA , 2013 .

[32]  Rama Chellappa,et al.  Fast directional chamfer matching , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Lance Williams,et al.  View Interpolation for Image Synthesis , 1993, SIGGRAPH.

[34]  Ronen Basri,et al.  Constructing implicit 3D shape models for pose estimation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[35]  Justus H. Piater,et al.  A Probabilistic Framework for 3D Visual Object Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Silvio Savarese,et al.  3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[37]  Ahmed M. Elgammal,et al.  Regression from local features for viewpoint and pose estimation , 2011, 2011 International Conference on Computer Vision.

[38]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[39]  Dana H. Ballard,et al.  Generalizing the Hough transform to detect arbitrary shapes , 1981, Pattern Recognit..

[40]  Ronen Basri,et al.  Viewpoint-aware object detection and continuous pose estimation , 2012, Image Vis. Comput..

[41]  Rita Cucchiara,et al.  People Orientation Recognition by Mixtures of Wrapped Distributions on Random Trees , 2012, ECCV.

[42]  Derek Hoiem,et al.  3D LayoutCRF for Multi-View Object Class Recognition and Segmentation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Luc Van Gool,et al.  Object Detection by Contour Segment Networks , 2006, ECCV.

[44]  Mubarak Shah,et al.  3D Model based Object Class Detection in An Arbitrary View , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[45]  Siddhartha S. Srinivasa,et al.  MOPED: A scalable and low latency object recognition and pose estimation system , 2010, 2010 IEEE International Conference on Robotics and Automation.

[46]  Bernt Schiele,et al.  An Implicit Shape Model for Combined Object Categorization and Segmentation , 2006, Toward Category-Level Object Recognition.

[47]  Avinash C. Kak,et al.  Calculating the 3d-pose of rigid-objects using active appearance models , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[48]  David G. Lowe,et al.  Probabilistic Models of Appearance for 3-D Object Recognition , 2000, International Journal of Computer Vision.

[49]  Cordelia Schmid,et al.  Flexible Object Models for Category-Level 3D Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Christoph H. Lampert,et al.  Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Ronen Basri,et al.  Viewpoint-aware object detection and pose estimation , 2011, 2011 International Conference on Computer Vision.

[52]  Björn Johansson,et al.  Patch-duplets for object recognition and pose estimation , 2005, The 2nd Canadian Conference on Computer and Robot Vision (CRV'05).

[53]  Silvio Savarese,et al.  View Synthesis for Recognizing Unseen Poses of Object Classes , 2008, ECCV.

[54]  Christian Perwass,et al.  Increasing pose estimation performance using multi-cue integration , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[55]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[56]  Frédéric Jurie,et al.  Groups of Adjacent Contour Segments for Object Detection , 2008, IEEE Trans. Pattern Anal. Mach. Intell..

[57]  Amnon Shashua,et al.  Novel view synthesis in tensor space , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[58]  Pietro Perona,et al.  Viewpoint-invariant learning and detection of human heads , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[59]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Steven M. Seitz,et al.  Toward image-based scene representation using view morphing , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[61]  Björn Ommer,et al.  From Meaningful Contours to Discriminative Object Shape , 2012, ECCV.

[62]  Silvio Savarese,et al.  Depth-Encoded Hough Voting for Joint Object Detection and Shape Recovery , 2010, ECCV.

[63]  Andrew Blake,et al.  Multiscale Categorical Object Recognition Using Contour Fragments , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[64]  R. Caflisch Monte Carlo and quasi-Monte Carlo methods , 1998, Acta Numerica.

[65]  Silvio Savarese,et al.  A multi-view probabilistic model for 3D object classes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.